In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping computation and communication and, second, decreasing resource contention for higher compute efficiency. In order to achieve this we use the OmpSs programming model based on task dependencies. We allow overlapping of computation and communication by converting all steps of the FFT into tasks following a flow dependency. In the same way, we d...
With the recent development of faster and more complex Multiprocessor System-on-Cips (MPSoCs), a lar...
Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medic...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
Manycores are consolidating in HPC community as a way of improving performance while keeping power e...
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all ma...
The Fast Fourier Transform (FFT) is one of the most widely used algorithms in engineering and scient...
Automatic library generators, such as ATLAS [11], Spi-ral [8] and FFTW [2], are promising technologi...
Increased complexity of memory systems to ameliorate the gap between the speed of processors and mem...
Li, XiaomingGraphic Processing Units (GPU) has been proved to be a promising platform to accelerate ...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accel...
Paper presented at CUG 2010, EdinburghCP2K is a freely available and increasingly popular Density Fu...
FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, sing...
With the recent development of faster and more complex Multiprocessor System-on-Cips (MPSoCs), a lar...
Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medic...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
Manycores are consolidating in HPC community as a way of improving performance while keeping power e...
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all ma...
The Fast Fourier Transform (FFT) is one of the most widely used algorithms in engineering and scient...
Automatic library generators, such as ATLAS [11], Spi-ral [8] and FFTW [2], are promising technologi...
Increased complexity of memory systems to ameliorate the gap between the speed of processors and mem...
Li, XiaomingGraphic Processing Units (GPU) has been proved to be a promising platform to accelerate ...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accel...
Paper presented at CUG 2010, EdinburghCP2K is a freely available and increasingly popular Density Fu...
FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, sing...
With the recent development of faster and more complex Multiprocessor System-on-Cips (MPSoCs), a lar...
Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medic...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...