Task-based dataflow programming models and runtimes em-erge as promising candidates for programming multicore and manycore architectures. These programming models ana-lyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical for perfor-mance, becomes more challenging in the presence of fine-grain tasks, and in architectures with many simple cores. This paper presents a combined hardware-software ap-proach to improve cache locality and offer better perfor-mance is terms of execution time and energy in the memory system. We propose the explicit bulk prefetcher (EBP) and epoch-based cache management (ECM) to help runtimes prefetch task...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Dependable real-time systems are essential to time-critical applications. The systems that run these...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
With processor speeds continuing to outpace the memory subsystem, cache missing memory operations co...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
In this paper, we present our design of a high performance prefetcher, which exploits various locali...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Dependable real-time systems are essential to time-critical applications. The systems that run these...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
With processor speeds continuing to outpace the memory subsystem, cache missing memory operations co...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
In this paper, we present our design of a high performance prefetcher, which exploits various locali...
As data prefetching is used in embedded processors, it is crucial to reduce the wasted energy for im...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Dependable real-time systems are essential to time-critical applications. The systems that run these...