The presence of multiple active threads on the same processor can mask latency by rapid context switching, but it can adversely affect performance due to competition for shared datapath resources. In this paper we present Macro Software Pipelining (MSWP), a loop scheduling technique for multithreaded processors, which is based on the loop distribution transformation for loop pipelining. MSWP constructs loop schedules by partitioning the loop body into tasks and assigning each task to a thread that executes all iterations for that particular task. MSWP is applied top-down on a hierarchical program representation, and utilizes thread-level speculation for maximal exploitation of parallelism. We tested MSWP on a multithreaded architectural mod...
This work examines the interaction of compiler scheduling techniques with processor features such as...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
In processors with several levels of hardware resource sharing, like CMPs in which each core is an S...
Loop scheduling has significant differences in multithreaded from other parallel processors. The sha...
226 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We tested MSWP on a Coral 200...
226 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We tested MSWP on a Coral 200...
In this paper, we propose a compiler method for software pipelining of loop nests on multi-core chip...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
With the current trend of multiprocessor machines towards more and more hierarchical architectures, ...
Pipelining is an important technique in high-level synthesis, which overlaps the execution of succes...
Pipelining is an important technique in high-level synthesis, which overlaps the execution of succes...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
This work examines the interaction of compiler scheduling techniques with processor features such as...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
In processors with several levels of hardware resource sharing, like CMPs in which each core is an S...
Loop scheduling has significant differences in multithreaded from other parallel processors. The sha...
226 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We tested MSWP on a Coral 200...
226 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We tested MSWP on a Coral 200...
In this paper, we propose a compiler method for software pipelining of loop nests on multi-core chip...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
With the current trend of multiprocessor machines towards more and more hierarchical architectures, ...
Pipelining is an important technique in high-level synthesis, which overlaps the execution of succes...
Pipelining is an important technique in high-level synthesis, which overlaps the execution of succes...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or f...
This work examines the interaction of compiler scheduling techniques with processor features such as...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
In processors with several levels of hardware resource sharing, like CMPs in which each core is an S...