Hardware prefetchers are commonly used to hide and tol-erate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential applications running on a multicore system. In contrast to mul-tiple independent applications, a single parallel application running on a multicore system exhibits different behavior. In case of a parallel application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communi-cate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XS...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
Multicore architectures are becoming ubiquitous in the microprocessor market today. All major vendor...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
With rapidly increasing parallelism, DRAM performance and power have surfaced as primary constraints...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thr...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
Multicore architectures are becoming ubiquitous in the microprocessor market today. All major vendor...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
With rapidly increasing parallelism, DRAM performance and power have surfaced as primary constraints...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
As many-core accelerators keep integrating more processing units, it becomes increasingly more diffi...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thr...
Abstract—Both on-chip resource contention and off-chip la-tencies have a significant impact on memor...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract As the difference in speed between processor and memory system continues to increase, it is...