Researchers have proposed several data and computation transformations to improve locality in irregular scientific codes. We experimentally compare their performance and present GPART, a new technique based on hierarchical clustering. Quality partitions are constructed quickly by clustering multiple neighboring nodes with priority on nodes with high degree, and repeating a few passes. Overhead is kept low by clustering multiple nodes in each pass and considering only edges between partitions. Experimental results show GPART matches the performance of more sophisticated partitioning algorithms to with 6%-8%, with a small fraction of the overhead. It is thus useful for optimizing programs whose running times are not known
Applications with irregular accesses to shared state are one of the most challenging computational p...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
An important class of scientific codes access memory in an irregular manner. Because irregular acce...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Many large-scale computational applications contain irregular data access patterns related to unstru...
In most cases of distributed memory computations, node programs are executed on processors according...
Abstract: Irregular reduction operations are the core of many large scientific and engineering appli...
Abstract. In most cases of distributed memory computations, node programs are executed on processors...
Numerical software for sequential or parallel machines with memory hierarchies can benefit from loca...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
In most computer systems, page fault rate is currently minimized by generic page replacement algorit...
Data-parallel languages, such as H scIGH P scERFORMANCE F scORTRAN or F scORTRAN D, provide a machin...
Emerging applications in areas such as bioinformatics, data analytics, semantic databases and knowle...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Applications with irregular accesses to shared state are one of the most challenging computational p...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
An important class of scientific codes access memory in an irregular manner. Because irregular acce...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
Many large-scale computational applications contain irregular data access patterns related to unstru...
In most cases of distributed memory computations, node programs are executed on processors according...
Abstract: Irregular reduction operations are the core of many large scientific and engineering appli...
Abstract. In most cases of distributed memory computations, node programs are executed on processors...
Numerical software for sequential or parallel machines with memory hierarchies can benefit from loca...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
In most computer systems, page fault rate is currently minimized by generic page replacement algorit...
Data-parallel languages, such as H scIGH P scERFORMANCE F scORTRAN or F scORTRAN D, provide a machin...
Emerging applications in areas such as bioinformatics, data analytics, semantic databases and knowle...
this paper, we propose a communication cost reduction computes rule for irregular loop partitioning...
Applications with irregular accesses to shared state are one of the most challenging computational p...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...