Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incom-plete at compile time. This situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for ecient run-time parallelization in distributed shared-memory (DSM) multiprocessors. The idea is to execute the code in parallel speculatively and use extensions to the cache coherence protocol hardware to detect any dependence violations. As soon as a dependence is detected, execution stops, the state is restored, and the code is re-executed serially. This scheme, w...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Shared memory is widely regarded as a more intuitive model than message passing for the development ...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Speculative parallel execution of statically non-analyzable codes on Distributed Shared-Memory (DSM)...
Data dependence speculation allows a compiler to relax the constraint of data-independence to issue ...
185 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1999.The simulation results show t...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggr...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelize...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
108 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2001.In this thesis, we also propo...
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster...
The advent of multicores presents a promising opportunity for speeding up the execution of sequentia...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Shared memory is widely regarded as a more intuitive model than message passing for the development ...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Speculative parallel execution of statically non-analyzable codes on Distributed Shared-Memory (DSM)...
Data dependence speculation allows a compiler to relax the constraint of data-independence to issue ...
185 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1999.The simulation results show t...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggr...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelize...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
108 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2001.In this thesis, we also propo...
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster...
The advent of multicores presents a promising opportunity for speeding up the execution of sequentia...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...
Shared memory is widely regarded as a more intuitive model than message passing for the development ...
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops becau...