Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling pre-fetch techniques. Memory accesses are classified at compile time in two classes, high-locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimize...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
Ease of programming is one of the main impediments for the broad acceptance of multi-core systems wi...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
The performance of a computing system heavily depends on the memory hierarchy. Fast but expensive ca...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Software pipelining for instruction-level parallel computers with non-blocking caches usually assign...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
Ease of programming is one of the main impediments for the broad acceptance of multi-core systems wi...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
The performance of a computing system heavily depends on the memory hierarchy. Fast but expensive ca...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Software pipelining for instruction-level parallel computers with non-blocking caches usually assign...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends chal...
The memory system remains a major performance bottleneck in modern and future architectures. In this...