Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quan-tum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo “analysis ” phase which ac-counts for a substantial fraction of the workload in a typi-cal LQCD calculation, the initial Monte Carlo “gauge field generation ” phase requires capability-level supercomputing, corresponding to O(100) GPUs or more. Such strong scaling has not been previously achieved. In this contribution, we demonstrate that using a multi-dimensional parallelization strategy and a domain-decomposed preconditioner allows us to scale into this regime. We present results...
Scientific computing applications demand ever-increasing performance while traditional microprocesso...
We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two st...
We revisit the Wilson-Dirac operator, also refered as Dslash, on multicore vector machines. The Wils...
Abstract—Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice ...
We present a case-study on the utility of graphics cards to perform massively parallel simulation of...
We present a case-study on the utility of graphics cards to perform massively parallel sim ulation w...
We present a case study on the utility of graphics cards to perform massively parallel simulation of...
The study of Quantum Chromodynamics (QCD) remains one of the most challenging topics in elementary p...
After a decade where high-end computing was dominated by the rapid pace of improvements to CPU frequ...
Efficient algorithms for the solution of partial differential equations on parallel computers are of...
We present an implementation of phaseless Auxiliary-Field Quantum Monte Carlo (ph-AFQMC) utilizing g...
We present a new massively parallel decomposition for grand canonical Monte Carlo computer simulatio...
The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice QCD simul...
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions...
We present $\texttt{SIMULATeQCD}$, HotQCD's software for performing lattice QCD calculations on GPUs...
Scientific computing applications demand ever-increasing performance while traditional microprocesso...
We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two st...
We revisit the Wilson-Dirac operator, also refered as Dslash, on multicore vector machines. The Wils...
Abstract—Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice ...
We present a case-study on the utility of graphics cards to perform massively parallel simulation of...
We present a case-study on the utility of graphics cards to perform massively parallel sim ulation w...
We present a case study on the utility of graphics cards to perform massively parallel simulation of...
The study of Quantum Chromodynamics (QCD) remains one of the most challenging topics in elementary p...
After a decade where high-end computing was dominated by the rapid pace of improvements to CPU frequ...
Efficient algorithms for the solution of partial differential equations on parallel computers are of...
We present an implementation of phaseless Auxiliary-Field Quantum Monte Carlo (ph-AFQMC) utilizing g...
We present a new massively parallel decomposition for grand canonical Monte Carlo computer simulatio...
The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice QCD simul...
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions...
We present $\texttt{SIMULATeQCD}$, HotQCD's software for performing lattice QCD calculations on GPUs...
Scientific computing applications demand ever-increasing performance while traditional microprocesso...
We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two st...
We revisit the Wilson-Dirac operator, also refered as Dslash, on multicore vector machines. The Wils...