Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the exec...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implem...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load co...
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implem...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load co...
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...