Data and computation alignment is an important part of compiling sequential programs to architectures with non-uniform memory access times. In this paper, we show that elementary matrix methods can be used to determine communication-free alignment of code and data. We also solve the problem of replicating read-only data to eliminate communication. Our matrix-based approach leads to algorithms which are simpler and faster than existing algorithms for the alignment problem. 1 Introduction: A key problem in generating code for non-uniform memory access (NUMA) parallel machines is data and computation placement --- that is, determining what work each processor must do, and what data must reside in each local memory. The goal of placement is to...
In this paper, an efficient algorithm to simultaneously implement array alignment and data/computati...
When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregat...
We present the ParAL system which compiles Matlab scripts into C programs with calls to a parallel r...
Data and computation alignment is an important part of compiling sequential programs to architecture...
this paper, weshow that elementary matrix methods can be used to determine communication-free alignm...
International audienceAn efficient algorithm to simultaneously implement array alignment and data/co...
Abstract—This paper presents a data layout optimization technique for sequential and parallel progra...
Aggregate data objects (such as arrays) are distributed across the processor memories when compilin...
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregat...
Abstract. Minimizing data communication over processors is the key to compile programs for dis-tribu...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Axis and stride alignment is an important optimization in compiling data-parallel programs for distr...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...
In this paper, an efficient algorithm to simultaneously implement array alignment and data/computati...
When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregat...
We present the ParAL system which compiles Matlab scripts into C programs with calls to a parallel r...
Data and computation alignment is an important part of compiling sequential programs to architecture...
this paper, weshow that elementary matrix methods can be used to determine communication-free alignm...
International audienceAn efficient algorithm to simultaneously implement array alignment and data/co...
Abstract—This paper presents a data layout optimization technique for sequential and parallel progra...
Aggregate data objects (such as arrays) are distributed across the processor memories when compilin...
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregat...
Abstract. Minimizing data communication over processors is the key to compile programs for dis-tribu...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
Axis and stride alignment is an important optimization in compiling data-parallel programs for distr...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...
In this paper, an efficient algorithm to simultaneously implement array alignment and data/computati...
When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregat...
We present the ParAL system which compiles Matlab scripts into C programs with calls to a parallel r...