Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is ineffective. In pre-execution, copies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor anticipates an upcoming miss. P-thread selection is the task of deciding what computations should execute on p-threads and when they should be launched such that total execution time is minimized. P-thread selection is central to the success of pre-execution. We introduce a framework for automated static p-thread selection, a static p-thread being one whose dynamic instances are repeatedly launched during the course of program execution. Our approach is to formalize the pro...
It is critical to provide high performance for scientific programs running on a Chip Multi-Processor...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
Hyper-threaded systems show an increase in popularity in modern computers due to the performance imp...
for Pre-Execution Pre-execution is a promising latency tolerance technique that uses one or more hel...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
Techniques for analyzing and improving memory referencing behavior continue to be important for achi...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
The need to provide performance guarantee in high perfor-mance servers has long been neglected. Prov...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Time predictability is one of the most important design considerations for real-time systems. In thi...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
This article investigates several source-to-source C compilers for extracting pre-execution thread c...
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap betw...
It is critical to provide high performance for scientific programs running on a Chip Multi-Processor...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
Hyper-threaded systems show an increase in popularity in modern computers due to the performance imp...
for Pre-Execution Pre-execution is a promising latency tolerance technique that uses one or more hel...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
Techniques for analyzing and improving memory referencing behavior continue to be important for achi...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
The need to provide performance guarantee in high perfor-mance servers has long been neglected. Prov...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Time predictability is one of the most important design considerations for real-time systems. In thi...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
This article investigates several source-to-source C compilers for extracting pre-execution thread c...
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap betw...
It is critical to provide high performance for scientific programs running on a Chip Multi-Processor...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
Hyper-threaded systems show an increase in popularity in modern computers due to the performance imp...