[[abstract]]This paper presents compilation techniques used to compress holes, which are caused by the nonunit alignment stride in a two-level data-processor mapping. Holes are the memory locations mapped by useless template cells. To fully utilize the memory space, memory holes should be removed. In a two-level data-processor mapping, there is a repetitive pattern for array elements mapped onto processors. We classify blocks into classes and use a class table to record the distribution of each class in the first repetitive data distribution pattern. Similarly, data distribution on a processor also has a repetitive pattern. We use a compression table to record the distribution of each block in the first repetitive data distribution pattern ...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
International audienceWith increasing numbers of cores, future CMPs (Chip Multi-Processors) are like...
The system efficiency and throughput of most architectures are critically dependent on the ability o...
This paper presents compilation techniques to compress holes, which are caused by the non-unit align...
This paper presents compilation techniques used to compress holes, which are caused by the nonunit a...
A bitmap index is a type of database index in which querying is implemented using logical operations...
[[abstract]]This paper presents an efficient compilation technique to generate the local memory acce...
A challenge in the design of high performance computer systems is how to transfer data efficiently b...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
processor architecture, memory system and management, cache memory, hardware and software technique,...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. ...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
International audienceWith increasing numbers of cores, future CMPs (Chip Multi-Processors) are like...
The system efficiency and throughput of most architectures are critically dependent on the ability o...
This paper presents compilation techniques to compress holes, which are caused by the non-unit align...
This paper presents compilation techniques used to compress holes, which are caused by the nonunit a...
A bitmap index is a type of database index in which querying is implemented using logical operations...
[[abstract]]This paper presents an efficient compilation technique to generate the local memory acce...
A challenge in the design of high performance computer systems is how to transfer data efficiently b...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
processor architecture, memory system and management, cache memory, hardware and software technique,...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. ...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
International audienceWith increasing numbers of cores, future CMPs (Chip Multi-Processors) are like...
The system efficiency and throughput of most architectures are critically dependent on the ability o...