The large latency of memory accesses in modern computers is a key obstacle in achieving high processor utilization. As a result, a variety of techniques have been devised to hide this latency. These techniques range from cache hierarchies to various prefetching and memory management techniques for manipulating the data present in the caches. In DSP applications, the existence of large numbers of uniform nested loops makes the issue of loop scheduling very important. In this paper, we propose a new memory management technique that can be applied to computer architectures with three levels of memory, the scheme generally adopted in contemporary computer architecture. This technique takes advantage of access pattern information that is availab...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Over the last 20 years, the performance gap between CPU and memory has been steadily increasing. As ...
Partition Scheduling with Prefetching (PSP) is a memory latency hiding technique which combines the ...
In this paper, we propose a novel loop scheduling technique based on multi-dimensional retiming in a...
The widening gap between processor and memory performance is the main bottleneck for modern computer...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
In order to improve performance, future parallel systems will continue to increase the processing po...
Memory accesses in modern processors are both far slower and vastly more energy-expensive than the a...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Summarization: By examining the rate at which successive generations of processor and DRAM cycle tim...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Over the last 20 years, the performance gap between CPU and memory has been steadily increasing. As ...
Partition Scheduling with Prefetching (PSP) is a memory latency hiding technique which combines the ...
In this paper, we propose a novel loop scheduling technique based on multi-dimensional retiming in a...
The widening gap between processor and memory performance is the main bottleneck for modern computer...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
Recent technological advances are such that the gap between processor cycle times and memory cycle t...
In order to improve performance, future parallel systems will continue to increase the processing po...
Memory accesses in modern processors are both far slower and vastly more energy-expensive than the a...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Data-intensive applications often exhibit memory referencing patterns with little data reuse, result...
Summarization: By examining the rate at which successive generations of processor and DRAM cycle tim...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...