A common feature of many scalable parallel machines is non-uniform memory access - a processor can access data in its local memory ten to a thousand times faster than it can access local data. In addition, when a number of remote accesses must be made, it is usually more efficient to use block transfers of data rather than to use many small messages. To run well on such machines, software must exploit these features. We believe it is too onerous for a programmer to do this by hand, so we have been exploring the use of restructuring compiler technology for this purpose. In this paper, we start with a language like FORTRAN-D with user-specified data distributions and develop a systematic loop transformation strategy c...
AbstractÐThe delivered performance on modern processors that employ deep memory hierarchies is close...
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
A common feature of many scalable parallel machines is non-uniform memory access - a processor can ...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
A common feature of many scalable parallel machines is non-uniform memory access (NUMA) --- data acc...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
In this paper, we discuss a loop transformation framework that is based on integer non-singular ma...
In this paper, we discuss a loop transformation framework that is based on integer non-singular mat...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
In recent years, methods for analyzing and parallelizing sequential code using data analysis and loo...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
Current high-performance multicore processors provide users with a non-uniform memory access model (...
AbstractÐThe delivered performance on modern processors that employ deep memory hierarchies is close...
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
A common feature of many scalable parallel machines is non-uniform memory access - a processor can ...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
A common feature of many scalable parallel machines is non-uniform memory access (NUMA) --- data acc...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
In this paper, we discuss a loop transformation framework that is based on integer non-singular ma...
In this paper, we discuss a loop transformation framework that is based on integer non-singular mat...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
In recent years, methods for analyzing and parallelizing sequential code using data analysis and loo...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
Current high-performance multicore processors provide users with a non-uniform memory access model (...
AbstractÐThe delivered performance on modern processors that employ deep memory hierarchies is close...
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...