We present an automated code engine (ACE) that automatically generates optimized kernels for computing integrals in electronic structure theory on a given graphical processing unit (GPU) computing platform. The code generator in ACE creates multiple code variants with different memory and floating point operation trade-offs. A graph representation is created as the foundation of the code generation, which allows the code generator to be extended to various types of integrals. The code optimizer in ACE determines the optimal code variant and GPU configurations for a given GPU computing platform by scanning over all possible code candidates and then choosing the best-performing code candidate for each kernel. We apply ACE to the optimization ...
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance compu...
We show how compiler technology can generate fast and efficient yet human-readable data-parallel sim...
We introduce a code generator that converts unoptimized C++ code operating on sparse data into vecto...
We present an automated code engine (ACE) that automatically generates optimized kernels for computi...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Modelica users can and want to build more realistic and complex models. This typically means slower ...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
It is well acknowledged that the dominant mechanism for scaling processor performance has become to ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Stencil computations arise in many scientific computing do-mains, and often represent time-critical ...
The ability to efficiently optimize or re-optimize an algorithm for high performance on a particular...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance compu...
We show how compiler technology can generate fast and efficient yet human-readable data-parallel sim...
We introduce a code generator that converts unoptimized C++ code operating on sparse data into vecto...
We present an automated code engine (ACE) that automatically generates optimized kernels for computi...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Modelica users can and want to build more realistic and complex models. This typically means slower ...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
It is well acknowledged that the dominant mechanism for scaling processor performance has become to ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Stencil computations arise in many scientific computing do-mains, and often represent time-critical ...
The ability to efficiently optimize or re-optimize an algorithm for high performance on a particular...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance compu...
We show how compiler technology can generate fast and efficient yet human-readable data-parallel sim...
We introduce a code generator that converts unoptimized C++ code operating on sparse data into vecto...