Due to their potentially high peak performance and energy efficiency, GPUs are increasingly popular for scientific computations. However, the complexity of the architecture makes it difficult to write code that achieves high performance. Two of the most important factors in achieving high performance are the usage of the GPU memory hierarchy and the way in which work is mapped to threads and blocks. The dominant frameworks for GPU computing, CUDA and OpenCL, leave these decisions largely to the programmer. In this work, we address this in part by proposing a technique that simultaneously manages use of the GPU low-latency shared memory and chooses the granularity with which to divide the work (block size). We show that a relatively simple h...
<p>Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are ...
International audienceThis book chapter proposes to draw several development methodologies to obtain...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. Howe...
GPUs are an increasingly popular implementation platform for a variety of general purpose applicatio...
Abstract—The NVIDIA graphics processing units (GPUs) are playing an important role as general purpos...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
\u3cp\u3eSummary Stencil computation is of paramount importance in many fields, in image processing,...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Abstract—Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPU...
<p>Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are ...
International audienceThis book chapter proposes to draw several development methodologies to obtain...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. Howe...
GPUs are an increasingly popular implementation platform for a variety of general purpose applicatio...
Abstract—The NVIDIA graphics processing units (GPUs) are playing an important role as general purpos...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
\u3cp\u3eSummary Stencil computation is of paramount importance in many fields, in image processing,...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Abstract—Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPU...
<p>Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are ...
International audienceThis book chapter proposes to draw several development methodologies to obtain...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. Howe...