Parallel patterns (e.g., map, reduce) have gained traction as an abstraction for targeting parallel accelerators and are a promising answer to the performance portability problem. However, compiling high-level programs into efficient low-level parallel code is challenging. Current approaches start from a high-level parallel IR and proceed to emit GPU code directly in one big step. Fixed strategies are used to optimize and map parallelism exploiting properties of a particular GPU generation leading to performance portability issues. We introduce the Lift IR, a new data-parallel IR which encodes OpenCL-specific constructs as functional patterns. Our prior work has shown that this functional nature simplifies the exploration of optimization...
Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism a...
Original article can be found at : http://portal.acm.org/ Copyright ACM [Full text of this article i...
AbstractHigh-level C++ proxies for the convenient manipulation of subvectors and submatrices on Open...
Parallel patterns (e.g., map, reduce) have gained traction as an abstraction for targeting parallel ...
Graphics Processing Units (GPUs) are now commonplace in computing systems and are the most successf...
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full pe...
Computers have become increasingly complex with the emergence of heterogeneous hardware combining mu...
Computing systems have become increasingly complex with the emergence of heterogeneous hardware comb...
General-purpose GPU-based systems are highly attractive, as they give potentially massive performanc...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Application development for modern high-performance systems with Graphics Processing Units (GPUs) cu...
Application development for modern high-performance systems with many cores, i.e., comprising multip...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
Computers have become increasingly complex with the emergence of heterogeneous hardware combining mu...
Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile to high-perfor...
Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism a...
Original article can be found at : http://portal.acm.org/ Copyright ACM [Full text of this article i...
AbstractHigh-level C++ proxies for the convenient manipulation of subvectors and submatrices on Open...
Parallel patterns (e.g., map, reduce) have gained traction as an abstraction for targeting parallel ...
Graphics Processing Units (GPUs) are now commonplace in computing systems and are the most successf...
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full pe...
Computers have become increasingly complex with the emergence of heterogeneous hardware combining mu...
Computing systems have become increasingly complex with the emergence of heterogeneous hardware comb...
General-purpose GPU-based systems are highly attractive, as they give potentially massive performanc...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Application development for modern high-performance systems with Graphics Processing Units (GPUs) cu...
Application development for modern high-performance systems with many cores, i.e., comprising multip...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
Computers have become increasingly complex with the emergence of heterogeneous hardware combining mu...
Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile to high-perfor...
Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism a...
Original article can be found at : http://portal.acm.org/ Copyright ACM [Full text of this article i...
AbstractHigh-level C++ proxies for the convenient manipulation of subvectors and submatrices on Open...