CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the destination registers of many in-flight instructions. However, an analogous mechanism does not exist for stores and loads. As the window expands, CPR/CFP processors must track all in-flight stores and loads to support forwarding and detect memory ordering violations. The previously-described SVW (Store Vulnerability Window) and SQIP (Store Queue Index Prediction) schemes provide scalable, non-associative load and store queues, respectively. However, they don't work smoothly in a CPR/CFP context. SVW/SQIP rely on the ability ...
International audienceSharing a physical register between several instructions is needed to implemen...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Large instruction window processors achieve high performance by exposing large amounts of instructio...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
CPR (Checkpoint Processing and Recovery) is a physical register management scheme that supports a la...
An efficient mechanism to track and enforce memory dependences is crucial to an out-of-order micropr...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
High-frequency memory checkpointing is an important technique in several application domains, such a...
Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifi...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
Modern out-of-order processors tolerate long latency memory operations by supporting a large number ...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
International audienceSharing a physical register between several instructions is needed to implemen...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Large instruction window processors achieve high performance by exposing large amounts of instructio...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is ...
CPR (Checkpoint Processing and Recovery) is a physical register management scheme that supports a la...
An efficient mechanism to track and enforce memory dependences is crucial to an out-of-order micropr...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
High-frequency memory checkpointing is an important technique in several application domains, such a...
Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifi...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
Modern out-of-order processors tolerate long latency memory operations by supporting a large number ...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
International audienceSharing a physical register between several instructions is needed to implemen...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...