The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques use some form of speculation to simplify the load-store unit and check this speculation by re-executing some of the loads prior to commit. We call such techniques load optimizations. One recent load optimization improves load queue (LQ) scalability by using re-execution rather than associative search to check speculative intra- and inter- thread memory ordering. A second technique improves store queue (SQ) scalability by speculatively filtering some load accesses and some store entries from it and re-executing loads to check that speculation. A third technique spe...
Virtually all processors today employ a store buffer (SB) to hide store latency. However, when the s...
Out-of-order processors heavily rely on speculation to achieve high performance, allowing instructio...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
The use of large instruction windows coupled with aggressive out-of order and prefetching capabiliti...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Virtually all processors today employ a store buffer (SB) to hide store latency. However, when the s...
Out-of-order processors heavily rely on speculation to achieve high performance, allowing instructio...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Speculative parallelization (SP) enables a processor to extract multiple threads from a single seque...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in...
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction...
The use of large instruction windows coupled with aggressive out-of order and prefetching capabiliti...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Virtually all processors today employ a store buffer (SB) to hide store latency. However, when the s...
Out-of-order processors heavily rely on speculation to achieve high performance, allowing instructio...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...