We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPI+ CUDA+ OpenMP code that uses concurrent CPU+ GPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-tra...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
With the recent advent of new heterogeneous computing architectures there is still a lack of paralle...
AbstractA high-productivity framework for multi-GPU and multi-CPU computation of stencil application...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Stencil computations arise in many scientific computing do-mains, and often represent time-critical ...
Original article can be found at : http://portal.acm.org/ Copyright ACM [Full text of this article i...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
AbstractGraphics processor units (GPUs) have evolved to handle throughput oriented workloads where a...
The shift toward parallel processor architectures has made programming and code generation increasin...
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do...
Special Section on Parallel, Distributed, and Reconfigurable Computing, and NetworkingGraphics proce...
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target f...
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
With the recent advent of new heterogeneous computing architectures there is still a lack of paralle...
AbstractA high-productivity framework for multi-GPU and multi-CPU computation of stencil application...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Stencil computations arise in many scientific computing do-mains, and often represent time-critical ...
Original article can be found at : http://portal.acm.org/ Copyright ACM [Full text of this article i...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
AbstractGraphics processor units (GPUs) have evolved to handle throughput oriented workloads where a...
The shift toward parallel processor architectures has made programming and code generation increasin...
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do...
Special Section on Parallel, Distributed, and Reconfigurable Computing, and NetworkingGraphics proce...
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target f...
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
With the recent advent of new heterogeneous computing architectures there is still a lack of paralle...