One of the significant issues of processor architectureis to overcome memory latency. Prefetching can greatlyimprove cache performance, however, it has the drawback ofcache pollution unless its aggressiveness is properly set. Althoughseveral techniques for prefetcher throttling have been proposedwhich use accuracy as a metric, their robustness were notsufficient due to the variations between program working setsizes and cache capacities.In this paper, we revisit cache behavior with the viwepointof data lifetime in a cache with prefetching. Based on thisobservation Cache-Convection-Control-based Prefetch Optimization(CCCPO) is proposed, which exploits the characteristicsof cache line reuse and controls the prefetcher aggressiveness.Evaluatio...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache ...
High performance processors employ hardware data prefetching to reduce the negative performance impa...
As the trends of process scaling make memory system even more crucial bottleneck, the importance of ...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Abstract — While numerous prior studies focused on perfor-mance and energy optimizations for caches,...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
Traditional software controlled data cache prefetching is often ineffective due to the lack of runti...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache ...
High performance processors employ hardware data prefetching to reduce the negative performance impa...
As the trends of process scaling make memory system even more crucial bottleneck, the importance of ...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Abstract — While numerous prior studies focused on perfor-mance and energy optimizations for caches,...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
Traditional software controlled data cache prefetching is often ineffective due to the lack of runti...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache ...