Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the min...
Producción CientíficaCurrent multicomputers are typically built as interconnected clusters of shared...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
Reducing communication overhead is extremely important in distributed-memory message-passing archite...
Programming for parallel architectures that do not have a shared address space is extremely difficul...
International audienceIn this paper we concentrate on embedded parallel architectures with heterogen...
This paper describes a number of optimizations that can be used to support the efficient execution o...
International audienceAdvances in semiconductor technique enable multiple processor cores to be inte...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
We present new techniques for compilation of arbitrarily nested loops with affine dependences for di...
International audienceThis paper describes dstep, a directive-based programming model for hybrid sha...
In this paper, we develop an automatic compile-time computation and data decomposition technique for...
Data-parallel languages allow programmers to use the familiar machine-independent programming style ...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Distributed-memory message-passing machines deliver scalable perfor-mance but are difficult to progr...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
Producción CientíficaCurrent multicomputers are typically built as interconnected clusters of shared...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
Reducing communication overhead is extremely important in distributed-memory message-passing archite...
Programming for parallel architectures that do not have a shared address space is extremely difficul...
International audienceIn this paper we concentrate on embedded parallel architectures with heterogen...
This paper describes a number of optimizations that can be used to support the efficient execution o...
International audienceAdvances in semiconductor technique enable multiple processor cores to be inte...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
We present new techniques for compilation of arbitrarily nested loops with affine dependences for di...
International audienceThis paper describes dstep, a directive-based programming model for hybrid sha...
In this paper, we develop an automatic compile-time computation and data decomposition technique for...
Data-parallel languages allow programmers to use the familiar machine-independent programming style ...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Distributed-memory message-passing machines deliver scalable perfor-mance but are difficult to progr...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
Producción CientíficaCurrent multicomputers are typically built as interconnected clusters of shared...
[[abstract]]In distributed memory multicomputers, local memory accesses are much faster than those i...
Reducing communication overhead is extremely important in distributed-memory message-passing archite...