Supercomputers need not only to have fast functional units, but also to have rapid access to massive quantities of data. Virtual memory paging and physically distributed memory systems both attempt to provide this large data space, but performance of a computer system using either memory organization is highly dependent on the page reference pattern and the number of pages available locally. Despite this, surprisingly little work has been done toward using the compiler to optimize memory system performance. In this paper, we introduce compiler techniques which use a combination of data layout and code transformation to improve paging performance for compiled programs. These same techniques can also be applied manually to improve performance...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
The memory system is a major bottleneck in achieving high performance and energy efficiency for vari...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
The system efficiency and throughput of most architectures are critically dependent on the ability o...
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor archi...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth h...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
This paper describes an algorithm to optimize cache locality in scientific codes on uniprocessor and...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1997. Simultaneously published...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
The memory system is a major bottleneck in achieving high performance and energy efficiency for vari...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
The system efficiency and throughput of most architectures are critically dependent on the ability o...
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor archi...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth h...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
This paper describes an algorithm to optimize cache locality in scientific codes on uniprocessor and...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1997. Simultaneously published...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
The memory system is a major bottleneck in achieving high performance and energy efficiency for vari...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...