Software pipelining for instruction-level parallel computers with non-blocking caches usually assigns memory access latency by assuming either all accesses are cache hits or all are cache misses. We contend setting memory latencies by cache reuse analysis leads to better software pipelining than either an all-hit or all-miss assumption. Using a simple cache-reuse model, our software pipelining optimization achieved 10% improved execution performance over assuming all-cache-hits and used 18% fewer registers than required by an all-cache-miss assumption. We conclude that software pipelining for architectures with non-blocking cache should incorprate a memory-reuse model
International audience<p>The growing complexity of modern computer architectures increasingly compli...
Guaranteeing time-predictable execution in real-time systems involves the management of not only pro...
As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limit...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Software pipelining is an instruction scheduling technique that exploits the instruction level paral...
Contention for shared cache resources has been recognized as a major bottleneck for multicores—espec...
Abstract: Software pipelining tries to improve the performance of a loop by overlapping the executio...
This paper formulates and shows how to solve the problem of selecting the cache size and depth of ca...
This paper formulates and shows how to solve the problem of selecting the cache size and depth of ca...
Ease of programming is one of the main impediments for the broad acceptance of multi-core systems wi...
register allocation, modulo scheduling, software pipelining, instruction scheduling, code generation...
In this paper we present a method for determining the cache performance of the loop nests in a progr...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
In computer systems, latency tolerance is the use of concurrency to achieve high performance in spit...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
International audience<p>The growing complexity of modern computer architectures increasingly compli...
Guaranteeing time-predictable execution in real-time systems involves the management of not only pro...
As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limit...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Software pipelining is an instruction scheduling technique that exploits the instruction level paral...
Contention for shared cache resources has been recognized as a major bottleneck for multicores—espec...
Abstract: Software pipelining tries to improve the performance of a loop by overlapping the executio...
This paper formulates and shows how to solve the problem of selecting the cache size and depth of ca...
This paper formulates and shows how to solve the problem of selecting the cache size and depth of ca...
Ease of programming is one of the main impediments for the broad acceptance of multi-core systems wi...
register allocation, modulo scheduling, software pipelining, instruction scheduling, code generation...
In this paper we present a method for determining the cache performance of the loop nests in a progr...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
In computer systems, latency tolerance is the use of concurrency to achieve high performance in spit...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
International audience<p>The growing complexity of modern computer architectures increasingly compli...
Guaranteeing time-predictable execution in real-time systems involves the management of not only pro...
As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limit...