This paper presents compilation techniques to compress holes, which are caused by the non-unit alignment stride in a two-level data-processor mapping. Holes are the memory locations mapped by useless template cells. To fully utilize the memory space, memory holes should be removed. In a twolevel data-processor mapping, there is a repetitive pattern for array elements mapped onto processors. We classify blocks into classes and use a class table to record the distribution of each class in the first repetitive data distribution pattern. Similarly, data distribution on a processor also has a repetitive pattern. We use a compression table to record the distribution of each block in the first repetitive data distribution pattern on a processor. B...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
[[abstract]]This paper presents compilation techniques used to compress holes, which are caused by t...
This paper presents compilation techniques used to compress holes, which are caused by the nonunit a...
A bitmap index is a type of database index in which querying is implemented using logical operations...
A challenge in the design of high performance computer systems is how to transfer data efficiently b...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
processor architecture, memory system and management, cache memory, hardware and software technique,...
[[abstract]]This paper presents an efficient compilation technique to generate the local memory acce...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. ...
The system efficiency and throughput of most architectures are critically dependent on the ability o...
International audienceWith increasing numbers of cores, future CMPs (Chip Multi-Processors) are like...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
[[abstract]]This paper presents compilation techniques used to compress holes, which are caused by t...
This paper presents compilation techniques used to compress holes, which are caused by the nonunit a...
A bitmap index is a type of database index in which querying is implemented using logical operations...
A challenge in the design of high performance computer systems is how to transfer data efficiently b...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
processor architecture, memory system and management, cache memory, hardware and software technique,...
[[abstract]]This paper presents an efficient compilation technique to generate the local memory acce...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. ...
The system efficiency and throughput of most architectures are critically dependent on the ability o...
International audienceWith increasing numbers of cores, future CMPs (Chip Multi-Processors) are like...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Increased programmability for concurrent applications in distributed systems requires automatic supp...