We present a technique for increasing the throughput of stream process-ing architectures by removing the bottlenecks caused by loop struc-tures. We implement loops as self-timed pipelined rings that can op-erate on multiple data sets concurrently. Our contribution includes a transformation algorithm which takes as input a high-level program and gives as output the structure of an optimized pipeline ring. Our technique handles nested loops and is further enhanced by loop un-rolling. Simulations run on benchmark examples show a 1.3 to 4.9x speedup without unrolling and a 2.6 to 9.7x speedup with twofold loop unrolling. 1
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
Pipelining algorithms are typically concerned with improving only the steady-state performance, or t...
Stream processing applications use online analytics to ingest high-rate data sources, process them o...
Data-driven array architectures seem to be important alternatives for coarse-grained reconfigurable ...
This paper presents a new approach for automatically pipelin-ing sequential circuits. The approach r...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
It is well-known that, to optimize a program for speed-up, efforts should be focused on the regions ...
Parallel processing has gained increasing importance over the last few years. A key aim of parallel ...
This paper addresses the problem of Time-Constrained Loop Pipelining, i.e. given a fixed throughput,...
© 1996 IEEE To take advantage of recent architecturalimprove-ments in microprocessors, advanced comp...
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations ...
This paper presents a mathematical model for the loop pipelining problem that considers several para...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
We address the problem of generating compact code from software pipelined loops. Although software p...
Loop pipelining is widely adopted as a key optimization method in high-level synthesis (HLS). Howeve...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
Pipelining algorithms are typically concerned with improving only the steady-state performance, or t...
Stream processing applications use online analytics to ingest high-rate data sources, process them o...
Data-driven array architectures seem to be important alternatives for coarse-grained reconfigurable ...
This paper presents a new approach for automatically pipelin-ing sequential circuits. The approach r...
Developing efficient programs for many of the current parallel computers is not easy due to the arch...
It is well-known that, to optimize a program for speed-up, efforts should be focused on the regions ...
Parallel processing has gained increasing importance over the last few years. A key aim of parallel ...
This paper addresses the problem of Time-Constrained Loop Pipelining, i.e. given a fixed throughput,...
© 1996 IEEE To take advantage of recent architecturalimprove-ments in microprocessors, advanced comp...
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations ...
This paper presents a mathematical model for the loop pipelining problem that considers several para...
Parallelizing compilers promise to exploit the parallelism available in a given program, particularl...
We address the problem of generating compact code from software pipelined loops. Although software p...
Loop pipelining is widely adopted as a key optimization method in high-level synthesis (HLS). Howeve...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
Pipelining algorithms are typically concerned with improving only the steady-state performance, or t...
Stream processing applications use online analytics to ingest high-rate data sources, process them o...