The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of memory operations. As the performance gap between processing speed and memory access becomes worse, the capacity requirements for the LQ-SQ increase, and its design becomes a challenge due to its CAM structure. In this paper we propose an efficient load-store queue state filtering mechanism that provides a significant energy reduction (on average 35% in the LSQ and 3.5% in the whole processor), and only incurs a negligible performance loss of less than 0.6%
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in ...
In most modern processor designs, the HW dedicated to store data and instructions (memory hierarchy)...
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-leve...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock ...
This paper describes several methods for improving the scalability of memory disambiguation hardware...
The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency ...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
One of the main challenges of modern processor designs is the implementation of scalable and efficie...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in ...
In most modern processor designs, the HW dedicated to store data and instructions (memory hierarchy)...
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor...
Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-leve...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock ...
This paper describes several methods for improving the scalability of memory disambiguation hardware...
The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency ...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are re...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
One of the main challenges of modern processor designs is the implementation of scalable and efficie...
A store queue (SQ) is a critical component of the load execution machinery. High ILP processors requ...
Future multi-core and many-core processors are likely to contain one or more high performance out-of...
Conventional dynamically scheduled processors often use fully associative structures named load/stor...