International audienceWe present an automatic, static program transformation that schedules and generates e cient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as possible. We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler. In the generated code, redundant communications due to data reuse between kernel executions are avoided. Instructions that initiate transfers are scheduled e ectively at compile-time. We present experimental results obtained with the Pol...
Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
This work studies programmability enhancing abstractions in the context of accelerators and heteroge...
International audienceLa puissance de calcul disponible dans les machines hybrides à base d'accéléra...
SIMD hardware accelerators o er an alternative to manycores when energy consumption and performance ...
Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heter...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
2 pagesInternational audienceRecent compilers comprise an incremental way for converting software to...
Hardware accelerators, such as fpga boards or gpu, are an interesting alternative or a valuable comp...
International audienceCurrent applications constraints are pushing for higher computation power whil...
Distributed-memory multicomputers, such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 /SP-2...
Depuis le début des années 2000, la performance brute des cœurs des processeurs a cessé son augmenta...
The goal of Partitioned Global Address Space (PGAS) languages is to improve programmer productivity ...
Communication coalescing is a static optimization that can reduce both communication frequency and r...
Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
This work studies programmability enhancing abstractions in the context of accelerators and heteroge...
International audienceLa puissance de calcul disponible dans les machines hybrides à base d'accéléra...
SIMD hardware accelerators o er an alternative to manycores when energy consumption and performance ...
Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heter...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
2 pagesInternational audienceRecent compilers comprise an incremental way for converting software to...
Hardware accelerators, such as fpga boards or gpu, are an interesting alternative or a valuable comp...
International audienceCurrent applications constraints are pushing for higher computation power whil...
Distributed-memory multicomputers, such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 /SP-2...
Depuis le début des années 2000, la performance brute des cœurs des processeurs a cessé son augmenta...
The goal of Partitioned Global Address Space (PGAS) languages is to improve programmer productivity ...
Communication coalescing is a static optimization that can reduce both communication frequency and r...
Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
This work studies programmability enhancing abstractions in the context of accelerators and heteroge...