This article investigates several source-to-source C compilers for extracting pre-execution thread code automatically, thus relieving the programmer or hardware from this onerous task. We present an aggressive profile-driven compiler that employs three powerful algorithms for code extraction. First, program slicing removes non-critical code for computing cache-missing memory references. Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls. Finally, speculative loop parallelization generates thread-level parallelism to tolerate the latency of blocking loads. In addition, we present four "reduced" compilers that employ less aggressive algorith...
Memory size is an important economic factor in the development of embedded systems. It is therefore ...
: This tutorial considers the design of modern machine-independent optimising compilers for classica...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
for Pre-Execution Pre-execution is a promising latency tolerance technique that uses one or more hel...
Pre-execution is a novel latency-tolerance technique where one or more helper threads run in front o...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Abstract. Helping programmers write parallel software is an urgent problem given the popularity of m...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
Pre-execution uses helper threads running in spare hardware contexts to trigger cache misses in fron...
Using only the internal program and data memory of a microcontroller can save large costs in embedde...
International audienceIt is well-known that today׳s compilers and state of the art libraries have th...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
This paper describes the design and implementation of an optimizing compiler that automatically gene...
Memory size is an important economic factor in the development of embedded systems. It is therefore ...
: This tutorial considers the design of modern machine-independent optimising compilers for classica...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
Pre-execution is a promising latency tolerance technique that uses one or more helper threads runnin...
for Pre-Execution Pre-execution is a promising latency tolerance technique that uses one or more hel...
Pre-execution is a novel latency-tolerance technique where one or more helper threads run in front o...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Abstract. Helping programmers write parallel software is an urgent problem given the popularity of m...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
Pre-execution uses helper threads running in spare hardware contexts to trigger cache misses in fron...
Using only the internal program and data memory of a microcontroller can save large costs in embedde...
International audienceIt is well-known that today׳s compilers and state of the art libraries have th...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
This paper describes the design and implementation of an optimizing compiler that automatically gene...
Memory size is an important economic factor in the development of embedded systems. It is therefore ...
: This tutorial considers the design of modern machine-independent optimising compilers for classica...
This paper describes transformation techniques for out-of-core pro-grams (i.e., those that deal with...