The execution time of programs that have large working sets is substantially increased by the overhead of retrieving data from the memory system. Even when a first level cache is integrated with the CPU, the memory overhead may increase the total execution time by 50-300%. In this dissertation we present a cost effective memory system that uses a novel address predictor to reduce the latency and a distributed second level cache to increase the bandwidth ofthe memory system. We have studied the inter-arrival times of requests tomain memory as well as the memory address request patterns for a workload of large scale programs. From this study we observed that the CPU address request patterns to main memory repeat during program execution. We p...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
In this letter, dynamic content placement of a local cache server that can store a subset of content...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
With the increasing performance gap between the processor and the memory, the importance of caches i...
The increasing speed gap between processor microarchitectures and memory technologies can potentiall...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Techniques for analyzing and improving memory referencing behavior continue to be important for achi...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
Summarization: By examining the rate at which successive generations of processor and DRAM cycle tim...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
In this letter, dynamic content placement of a local cache server that can store a subset of content...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Efficient data supply to the processor is the one of the keys to achieve high performance. However, ...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
With the increasing performance gap between the processor and the memory, the importance of caches i...
The increasing speed gap between processor microarchitectures and memory technologies can potentiall...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
Techniques for analyzing and improving memory referencing behavior continue to be important for achi...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
One major restriction to the performance of out-of-order superscalar processors is the latency of lo...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
Summarization: By examining the rate at which successive generations of processor and DRAM cycle tim...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
In this letter, dynamic content placement of a local cache server that can store a subset of content...
Abstract—An increasing cache latency in next-generation pro-cessors incurs profound performance impa...