Irregular applications pose challenges in optimizing communication, due to the difficulty of analyzing irregular data accesses accurately and efficiently. This challenge is especially big when translating irregular shared-memory applications to message passing form for clusters. The lack of effective irregular data analysis in the translation system results in unnecessary or redundant communication, which limits application scalability. In this paper, we present a Lean Distributed Shared Memory (LDSM) system, which features a fast and accurate irregular data access (IDA) analysis. The analysis uses a region-based diff method and makes use of a runtime library that is optimized for irregular applications. We describe three optimizations that...
The parallelization of several applications result in unstructured data accesses on coarse-grained, ...
In this paper we present several algorithms for performing all-to-many personalized communication on...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Irregular applications pose challenges in optimizing communication, due to the difficulty of analyzi...
This paper describes a number of optimizations that can be used to support the efficient execution o...
OpenMP has emerged as the de facto standard for writing parallel programs on shared address space pl...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
Applications with irregular accesses to shared state are one of the most challenging computational p...
In previous work, we have proposed techniques to extend the ease of shared-memory parallel programmi...
Generalizable approaches, models, and frameworks for irregular application scalability is an old yet...
Irregular computation problems underlie many important scientific applications. Although these probl...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
Reducing communication overhead is crucial for improving the performance of programs on distributed-...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
The parallelization of several applications result in unstructured data accesses on coarse-grained, ...
In this paper we present several algorithms for performing all-to-many personalized communication on...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Irregular applications pose challenges in optimizing communication, due to the difficulty of analyzi...
This paper describes a number of optimizations that can be used to support the efficient execution o...
OpenMP has emerged as the de facto standard for writing parallel programs on shared address space pl...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
Applications with irregular accesses to shared state are one of the most challenging computational p...
In previous work, we have proposed techniques to extend the ease of shared-memory parallel programmi...
Generalizable approaches, models, and frameworks for irregular application scalability is an old yet...
Irregular computation problems underlie many important scientific applications. Although these probl...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
Reducing communication overhead is crucial for improving the performance of programs on distributed-...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
The parallelization of several applications result in unstructured data accesses on coarse-grained, ...
In this paper we present several algorithms for performing all-to-many personalized communication on...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...