It has been suggested that non-scientific code has very little parallelism not already exploited by existing processors. In this paper we show that contrary to this notion, there is actually a significant amount of unexploited parallelism in typical general purpose code. In order to exploit this parallelism, a combination of hardware and software techniques must be applied. We analyze three techniques: dynamic scheduling, speculative execution and basic block enlargement. We will show that indeed for narrow instruction words 1itde is to be gained by appl yi ng these techniques. However, as the number of simultaneous operations increases, it beeomes possible to achieve speedups of three to six on realistic processors.
We present a transformational system for extracting parallelism from programs. Our transformations g...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
Advanced many-core CPU chips already have few hundreds of processing cores (e.g. 160 cores in an IBM...
This dissertation demonstrates that through the careful application of hardware and software techniq...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Business demands for better computing power because the cost of hardware is declining day by day. Th...
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are ag...
Abstract. The traditional target machine of a parallelizing compiler can execute code sections eithe...
To run a software application on a large number of parallel processors, N, and expect to obtain spee...
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in seque...
Coarse-grained task parallelism exists in sequential code and can be leveraged to boost the use of ...
Over the past two decades tremendous progress has been made in both the design of parallel architect...
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to...
In order to utilize parallel computers, four approaches, broadly speaking, to the provision of paral...
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to...
We present a transformational system for extracting parallelism from programs. Our transformations g...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
Advanced many-core CPU chips already have few hundreds of processing cores (e.g. 160 cores in an IBM...
This dissertation demonstrates that through the careful application of hardware and software techniq...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Business demands for better computing power because the cost of hardware is declining day by day. Th...
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are ag...
Abstract. The traditional target machine of a parallelizing compiler can execute code sections eithe...
To run a software application on a large number of parallel processors, N, and expect to obtain spee...
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in seque...
Coarse-grained task parallelism exists in sequential code and can be leveraged to boost the use of ...
Over the past two decades tremendous progress has been made in both the design of parallel architect...
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to...
In order to utilize parallel computers, four approaches, broadly speaking, to the provision of paral...
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to...
We present a transformational system for extracting parallelism from programs. Our transformations g...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
Advanced many-core CPU chips already have few hundreds of processing cores (e.g. 160 cores in an IBM...