In HPC, data redistributions (reorganizations) are used in parallel applications to improve performance and/or provide data-locality compatibility with sequences of parallel operations. Data reorganization refers to changing the logical and physical arrangement of data (such as dense arrays or matrices distributed over peer processes in a parallel program). This operation can be achieved by applying transformations such as transpositions or rotations or changing how data is mapped across the process grid P by Q, all of which are accomplished either with message passing or distributed shared memory operations. In this project, we restrict ourselves to a distributed memory model and message passing, not a shared-memory model, nor do we use di...
In this document, we describe two strategies of distribution of computations that can be used to imp...
In this document, we describe two strategies of distribution of computations that can be used to imp...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
The polyalgorithm library, originally designed in 1991-1993 by Robert Falgout, Jin Li, and Anthony S...
Data-partition and migration for efficient communication in distributed memory architectures are cri...
Modern large-scale deep learning workloads highlight the need for parallel execution across many dev...
First-generation scalable parallel libraries have been achieved, and are maturing, within the Multi...
2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, moti...
This paper describes efficient algorithms for runtime array redistribution in HPF programs. We consi...
In this paper, we introduce a model for managing abstract data structures that map to arbitrary dist...
Appropriate data distribution has been found to be critical for obtaining good performance on Distri...
High performance, massively-parallel multi-physics simulations are built on efficient mesh data stru...
Languages such as High Performance Fortran implement parallel algorithms by distributing large data ...
[[abstract]]In many scientific applications, array redistribution is usually required to enhance dat...
A ubiquitous problem in computer science research is the optimization of computation on large data s...
In this document, we describe two strategies of distribution of computations that can be used to imp...
In this document, we describe two strategies of distribution of computations that can be used to imp...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
The polyalgorithm library, originally designed in 1991-1993 by Robert Falgout, Jin Li, and Anthony S...
Data-partition and migration for efficient communication in distributed memory architectures are cri...
Modern large-scale deep learning workloads highlight the need for parallel execution across many dev...
First-generation scalable parallel libraries have been achieved, and are maturing, within the Multi...
2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, moti...
This paper describes efficient algorithms for runtime array redistribution in HPF programs. We consi...
In this paper, we introduce a model for managing abstract data structures that map to arbitrary dist...
Appropriate data distribution has been found to be critical for obtaining good performance on Distri...
High performance, massively-parallel multi-physics simulations are built on efficient mesh data stru...
Languages such as High Performance Fortran implement parallel algorithms by distributing large data ...
[[abstract]]In many scientific applications, array redistribution is usually required to enhance dat...
A ubiquitous problem in computer science research is the optimization of computation on large data s...
In this document, we describe two strategies of distribution of computations that can be used to imp...
In this document, we describe two strategies of distribution of computations that can be used to imp...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...