This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the out-of-order engine. NoSQ implements store-load communication using speculative memory bypassing (SMB), the dynamic short-circuiting of DEF-store-load-USE chains to DEF-USE chains. Whereas previous proposals used SMB as an opportunistic complement to conventional store queue-based forwarding, NoSQ uses SMB as a store queue replacement. NoSQ relies on two supporting mechanisms. The first is an advanced store-load bypassing predictor that for a given dynamic load can predict whether that load will bypass and the identity of the communicating store. The second is an efficient ...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Store-queue-free architectures remove the store queue and use memory cloaking to communicate in-flig...
The NoSQ microarchitecture performs store-load communication without a store queue and without execu...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implem...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Store-queue-free architectures remove the store queue and use memory cloaking to communicate in-flig...
The NoSQ microarchitecture performs store-load communication without a store queue and without execu...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implem...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Store-queue-free architectures remove the store queue and use memory cloaking to communicate in-flig...