A wide variety of computer architectures have been proposed to exploit parallelism at different granularities. These architectures have significant differences in instruction scheduling constraints, memory latencies, and synchronization overhead, making it difficult to determine which architecture can achieve the best performance on a given program. Trace-driven simulations and analytic models are used to compare the instruction-level parallelism of a superscalar processor and a pipelined processor with the loop-level parallelism of a shared memory multiprocessor. It is shown that the maximum speedup for a loop with a cyclic dependence graph is limited by its critical dependence ratio, independent of the number of iterations in the loop. Th...
The potential for higher performance from increasing on-chip transistor densities, on the one hand, ...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
Although it is convenient to program large-scale multiprocessors as though all processors shared acc...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
200 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.The use of a private cache in...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
Although improved device technology has increased the performance of computer systems, fundamental h...
Due to VLSI lithography problems and the limitation of additional architectural enhancements uniproc...
Interest in multitasked multiprocessor systems is motivated by the necessity to increase throughput ...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
. Data used by parallel programs can be divided into classes, based on how threads access it. For di...
In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
The potential for higher performance from increasing on-chip transistor densities, on the one hand, ...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
Although it is convenient to program large-scale multiprocessors as though all processors shared acc...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
200 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.The use of a private cache in...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
Although improved device technology has increased the performance of computer systems, fundamental h...
Due to VLSI lithography problems and the limitation of additional architectural enhancements uniproc...
Interest in multitasked multiprocessor systems is motivated by the necessity to increase throughput ...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
The last decade has produced enormous improvements in processor speeds without a corresponding impro...
. Data used by parallel programs can be divided into classes, based on how threads access it. For di...
In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
The potential for higher performance from increasing on-chip transistor densities, on the one hand, ...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
Although it is convenient to program large-scale multiprocessors as though all processors shared acc...