A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times. Unlike techniques such as loop unrolling, a loop buffer is a hardware technique that can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB also requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common improving transformations used by optimizing compilers to improve code on co...
We review the evolution of DSP architectures and compiler technology, and describe how compiler tech...
In this work, we present a minimalistic, energy efficient implementation of instruction buffer. We u...
\u3cp\u3eEnergy consumption in embedded systems is strongly dominated by instruction memory organiza...
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the ins...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such...
Software pipelining is an effective technique to reduce cycle count by exploiting instruction level ...
A loop buffer is a memory located between CPU and level one instruction cache, called IL1 hereafter....
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the ins...
[[abstract]]Several loop-buffering techniques were proposed for reducing power consumption of embedd...
Abstract—Recently, several loop buffer designs have been proposed to reduce instruction fetch energy...
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the ins...
Recent studies show that very long instruction word (VLIW) architectures, which inherently have wide...
Parallelizing compiler technology has improved in re-cent years. One area in which compilers have ma...
In this paper we propose a technique that uses an ad-ditional mini cache located between the I-Cache...
We review the evolution of DSP architectures and compiler technology, and describe how compiler tech...
In this work, we present a minimalistic, energy efficient implementation of instruction buffer. We u...
\u3cp\u3eEnergy consumption in embedded systems is strongly dominated by instruction memory organiza...
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the ins...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such...
Software pipelining is an effective technique to reduce cycle count by exploiting instruction level ...
A loop buffer is a memory located between CPU and level one instruction cache, called IL1 hereafter....
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the ins...
[[abstract]]Several loop-buffering techniques were proposed for reducing power consumption of embedd...
Abstract—Recently, several loop buffer designs have been proposed to reduce instruction fetch energy...
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the ins...
Recent studies show that very long instruction word (VLIW) architectures, which inherently have wide...
Parallelizing compiler technology has improved in re-cent years. One area in which compilers have ma...
In this paper we propose a technique that uses an ad-ditional mini cache located between the I-Cache...
We review the evolution of DSP architectures and compiler technology, and describe how compiler tech...
In this work, we present a minimalistic, energy efficient implementation of instruction buffer. We u...
\u3cp\u3eEnergy consumption in embedded systems is strongly dominated by instruction memory organiza...