Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give results that are consistent with the timing results for the particle force computation obtained on NVidia GeForce GTX960 and Tesla P100 GPUs. Further analys...
Copyright: © 2015 Materials Research SocietyThis article discusses novel algorithms for molecular-dy...
In several fields of research, molecular dynamics simulation techniques are exploited to evaluate th...
Data locality is a well-recognized requirement for the development of any parallel application, but ...
Application performance on graphical processing units (GPUs), in terms of execution speed and memory...
General-purpose computing on GPUs is widely adopted for scientific applications, providing inexpensi...
In this thesis we look at a performance bottleneck of running molecular dynamics code on GPGPU devic...
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern m...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...
International audienceNumerical simulations using supercomputers are producing an ever growing amoun...
This article presents the GPU parallelization of new algorithms SD and DPD types for molecular dynam...
Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower compu...
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. C...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for...
We present projection sorting, an algorithmic approach to determining pairwise short-range forces be...
Copyright: © 2015 Materials Research SocietyThis article discusses novel algorithms for molecular-dy...
In several fields of research, molecular dynamics simulation techniques are exploited to evaluate th...
Data locality is a well-recognized requirement for the development of any parallel application, but ...
Application performance on graphical processing units (GPUs), in terms of execution speed and memory...
General-purpose computing on GPUs is widely adopted for scientific applications, providing inexpensi...
In this thesis we look at a performance bottleneck of running molecular dynamics code on GPGPU devic...
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern m...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...
International audienceNumerical simulations using supercomputers are producing an ever growing amoun...
This article presents the GPU parallelization of new algorithms SD and DPD types for molecular dynam...
Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower compu...
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. C...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for...
We present projection sorting, an algorithmic approach to determining pairwise short-range forces be...
Copyright: © 2015 Materials Research SocietyThis article discusses novel algorithms for molecular-dy...
In several fields of research, molecular dynamics simulation techniques are exploited to evaluate th...
Data locality is a well-recognized requirement for the development of any parallel application, but ...