Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor scalability and high energy consumption. Recently proposals only focus on improving the LSQ scalability to increase the in-flight instruction capacity, but with poor performance improvement and energy efficiency. This paper presents a novel speculative store-load forwarding mechanism, named SOLE (speculative one-cycle load execution)(1). Firstly, SOLE uses address identifiers to determine the memory disambiguation, rather than the exact memory addresses as the traditional LSQ does. Since the address identifier is just simple hash from the address base and offset, the speculative store-load forwarding could be advanced earlier to reduce th...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock ...
In most modern processor designs, the HW dedicated to store data and instructions (memory hierarchy)...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implem...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency ...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in ...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
To alleviate the memory wall problem, current architec-tural trends suggest implementing large instr...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock ...
In most modern processor designs, the HW dedicated to store data and instructions (memory hierarchy)...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implem...
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency ...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
memory disambiguation, load-forwarding, speculation The superscalar processor must issue instruction...
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in ...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory schedul...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding....
To alleviate the memory wall problem, current architec-tural trends suggest implementing large instr...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock ...
In most modern processor designs, the HW dedicated to store data and instructions (memory hierarchy)...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...