We consider a variety of dynamic, hardware-based methods for exploiting load/store parallelism, including mechanisms that use memory dependence speculation. While previous work has also investigated such methods [19,4], this has been done primarily for split, distributed window processor models. We focus on centralized, continuous-window processor models (the common configuration today). We confirm that exploiting load/ store parallelism can greatly improve performance. Moreover, we show that much of this performance potential can be captured if addresses of the memory locations accessed by both loads and stores can be used to schedule loads. However, using addresses to schedule load execution may not always be an option due to complexity, ...
Dynamically finding parallelism in sequential applications with hardware mechanisms is typically lim...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
As the existing techniques that empower the modern high-performance processors are being refined and...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Store-queue-free architectures remove the store queue and use memory cloaking to communicate in-flig...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
We explore various intervals between register assignments and the subsequent uses of the same reg-is...
Memory dependence prediction allows out-of-order is-sue processors to achieve high degrees of instru...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
Memory dependence prediction allows out-of-order issue processors to achieve high degrees of instruc...
Modern out-of-order processor architectures focus significantly on the high performance execution of...
Speculative parallelization (SP) enables a processor to extract multiple threads from a sequential i...
The problem of extracting InstructionLevel Parallelism at levels of 10 instructionsper clock and hig...
Dynamically finding parallelism in sequential applications with hardware mechanisms is typically lim...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
As the existing techniques that empower the modern high-performance processors are being refined and...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Store-queue-free architectures remove the store queue and use memory cloaking to communicate in-flig...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
We explore various intervals between register assignments and the subsequent uses of the same reg-is...
Memory dependence prediction allows out-of-order is-sue processors to achieve high degrees of instru...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
Memory dependence prediction allows out-of-order issue processors to achieve high degrees of instruc...
Modern out-of-order processor architectures focus significantly on the high performance execution of...
Speculative parallelization (SP) enables a processor to extract multiple threads from a sequential i...
The problem of extracting InstructionLevel Parallelism at levels of 10 instructionsper clock and hig...
Dynamically finding parallelism in sequential applications with hardware mechanisms is typically lim...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...