Kaczmarek O, Schmidt C, Steinbrecher P, Wagner M. Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs. In: Bonati C, Lamanna G, D'Elia M, Sozzi M, eds. Proceedings, GPU Computing in High-Energy Physics (GPUHEP2014) : Pisa, Italy, September 10-12, 2014. 2015: 157-162.Lattice Quantum Chromodynamics simulations typically spend most of theruntime in inversions of the Fermion Matrix. This part is therefore frequentlyoptimized for various HPC architectures. Here we compare the performance of theIntel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugategradient solver. By exposing more parallelism to the accelerator throughinverting multiple vectors at the same time, we obtain a performance greaterthan 300 GFlop/s on bot...
Abstract—A new sparse high performance conjugate gradient benchmark (HPCG) has been recently release...
We explore the diagonalization methods used in the PWscf (Plane-Wave Self Consistent Field), a key ...
Scientific computing applications demand ever-increasing performance while traditional microprocesso...
Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the ...
Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the ...
Mukherjee S, Kaczmarek O, Schmidt C, Steinbrecher P, Wagner M. HISQ inverter on Intel Xeon Phi and N...
Abstract. Lattice QuantumChromodynamics (LQCD) is currently the only known model independent, non pe...
In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) imple...
Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are prese...
The conjugate gradient (CG) algorithm is among the most essential and time consuming parts of lattic...
We present the first GPU-based conjugate gradient (CG) solver for lattice QCD with domain-wall fermi...
Lattice QCD calculations require significant computational effort, with the dominant fraction of res...
FPGA devices used in the HPC context promise an increased energy efficiency, enhancing the computing...
The Conjugate Gradient method is a popular iterative method to solve a system of linear equations an...
DOI: will be assigned Lattice Quantum Chromodynamics simulations typically spend most of the runtime...
Abstract—A new sparse high performance conjugate gradient benchmark (HPCG) has been recently release...
We explore the diagonalization methods used in the PWscf (Plane-Wave Self Consistent Field), a key ...
Scientific computing applications demand ever-increasing performance while traditional microprocesso...
Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the ...
Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the ...
Mukherjee S, Kaczmarek O, Schmidt C, Steinbrecher P, Wagner M. HISQ inverter on Intel Xeon Phi and N...
Abstract. Lattice QuantumChromodynamics (LQCD) is currently the only known model independent, non pe...
In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) imple...
Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are prese...
The conjugate gradient (CG) algorithm is among the most essential and time consuming parts of lattic...
We present the first GPU-based conjugate gradient (CG) solver for lattice QCD with domain-wall fermi...
Lattice QCD calculations require significant computational effort, with the dominant fraction of res...
FPGA devices used in the HPC context promise an increased energy efficiency, enhancing the computing...
The Conjugate Gradient method is a popular iterative method to solve a system of linear equations an...
DOI: will be assigned Lattice Quantum Chromodynamics simulations typically spend most of the runtime...
Abstract—A new sparse high performance conjugate gradient benchmark (HPCG) has been recently release...
We explore the diagonalization methods used in the PWscf (Plane-Wave Self Consistent Field), a key ...
Scientific computing applications demand ever-increasing performance while traditional microprocesso...