This paper discusses an abstraction, called the Data Mover, for expressing machine-independent customized communication algorithms in a variety of block-structured applications. The Data Mover enables its user to express data motion using intuitive geometric operations that encapsulate the low-level details of the underlying communication. Communication patterns are expressed as collective operations, and are restricted to movement of rectangular array sections. We describe the Data Mover model of communication, and present performance for various applications. The Data Mover currently serves as useful middleware for application library designers, but defines a simple machine-independent interface suitable as a target for a compiler or comp...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Many high performance applications spend considerable time packing data into contiguous communicatio...
Programming for parallel architectures that do not have a shared address space is extremely difficul...
Abstract. On multi-core architectures with software-managed memories, effec-tively orchestrating dat...
We present efficient support schemes for generalized arrays of parallel data driven objects. The &qu...
Communicating complex data structures, that is those containing pointers, across machines is a commo...
The ability to represent, manipulate and optimize data placement and movement between processors in ...
This paper describes a general compiler optimization technique that reduces communication overhead f...
In the paper Supporting Lock-Free Composition of Concurrent Data Objects we introduced a methodology...
Abstraction concepts based on process groups have largely dominated the design and implementation of...
Usually, the components of the distributed software applications are developed using the same techno...
Relocation adjusts machine instructions to account for changes in the locations either of the instru...
AbstractCoordination languages for parallel and distributed systems specify mechanisms for creating ...
The performance of a data parallel program is critically dependent on the data decomposition that th...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Many high performance applications spend considerable time packing data into contiguous communicatio...
Programming for parallel architectures that do not have a shared address space is extremely difficul...
Abstract. On multi-core architectures with software-managed memories, effec-tively orchestrating dat...
We present efficient support schemes for generalized arrays of parallel data driven objects. The &qu...
Communicating complex data structures, that is those containing pointers, across machines is a commo...
The ability to represent, manipulate and optimize data placement and movement between processors in ...
This paper describes a general compiler optimization technique that reduces communication overhead f...
In the paper Supporting Lock-Free Composition of Concurrent Data Objects we introduced a methodology...
Abstraction concepts based on process groups have largely dominated the design and implementation of...
Usually, the components of the distributed software applications are developed using the same techno...
Relocation adjusts machine instructions to account for changes in the locations either of the instru...
AbstractCoordination languages for parallel and distributed systems specify mechanisms for creating ...
The performance of a data parallel program is critically dependent on the data decomposition that th...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Many high performance applications spend considerable time packing data into contiguous communicatio...