Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory scheduling and store-to-load forwarding. However, the LQ and SQ scale poorly for the sizes required for large-window, high-ILP processors. Past research has proposed ways to make the SQ more scalable by reorganizing the CAMs or using non-associative structures. In particular, the Store Queue Index Prediction (SQIP) approach allows for load instructions to predict the exact SQ index of a sourcing store and access the SQ in a much simpler and more scalable RAMbased fashion. The reason why SQIP works is that loads that receive data directly from stores will usually receive the data from the same store each time. In our work, we take a slightly differen...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Store misses cause significant delays in shared-memory multiprocessors because of limited store buff...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
Abstract—As FPGAs continue to increase in size, it becomes increasingly feasible and desirable to bu...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load co...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Store misses cause significant delays in shared-memory multiprocessors because of limited store buff...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
Abstract—As FPGAs continue to increase in size, it becomes increasingly feasible and desirable to bu...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load co...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Store misses cause significant delays in shared-memory multiprocessors because of limited store buff...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...