The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system per- formance due to the disparity between processor and memory speeds. Prefetching data blocks into the cache hierarchy ahead of demand accesses has proven successful at attenuating this bottleneck. However, spatial cache prefetchers operating in the physical address space leave significant performance on the table by limiting their pattern detection within 4KB physical page boundaries when modern systems use page sizes larger than 4KB to mitigate the address translation overheads. This paper exploits the high usage of large pages in modern systems to increase the effectivenes...
With explosive growth in dataset sizes and increasing machine memory capacities, per-application mem...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Recent research suggests that there are large variations in a cache's spatial usage, both within and...
The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, r...
Frequent Translation Lookaside Buffer (TLB) misses incur high performance and energy costs due to pa...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Abstract—Cache compression improves the performance of a multi-core system by being able to store mo...
The “Memory Wall”, the vast gulf between processor execution speed and memory latency, has led to th...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
In the last century great progress was achieved in developing processors with extremely high computa...
As technological process shrinks and clock rate increases, instruction caches can no longer be acces...
Prefetching is an important technique for reducing the average latency of memory accesses in scalabl...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
With explosive growth in dataset sizes and increasing machine memory capacities, per-application mem...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Recent research suggests that there are large variations in a cache's spatial usage, both within and...
The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, r...
Frequent Translation Lookaside Buffer (TLB) misses incur high performance and energy costs due to pa...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Abstract—Cache compression improves the performance of a multi-core system by being able to store mo...
The “Memory Wall”, the vast gulf between processor execution speed and memory latency, has led to th...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
In the last century great progress was achieved in developing processors with extremely high computa...
As technological process shrinks and clock rate increases, instruction caches can no longer be acces...
Prefetching is an important technique for reducing the average latency of memory accesses in scalabl...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
With explosive growth in dataset sizes and increasing machine memory capacities, per-application mem...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Recent research suggests that there are large variations in a cache's spatial usage, both within and...