This paper introduces the notion of silent loads to classify load accesses that can be satisfied by already available values of the physical register file and proposes a new architectural concept to exploit such loads. The paper then unifies different approaches of eliminating memory accesses early by contributing with a new architectural scheme. We show that our unified approach covers previously proposed techniques of exploiting forwarded and small-value loads in addition to silent loads. Forwarded loads obtain values through load-to-load and store-to-load forwarding whereas small-value loads return small values that can be coded with 8 bits or less. We find that 22%, 31% and 24% of all dynamic loads are forwarded, small-value and silent,...
Register window is an architectural technique that reduces memory operations required to save and re...
The storage for speculative values in superscalar processors is one of the main sources of complexit...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
This paper introduces the notion of silent loads to classify load accesses that can be satisfied by ...
Execution efficiency of memory instructions remains critically important. To this end, a plethora of...
Execution efficiency of memory instructions remains critically important. To this end, a plethora of...
As multicore architectures have hit the mainstream, one of the challenges for future multicore desig...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy ca...
In modern architectures the register file is one of the most energy consuming and frequently used co...
The detection of opportunities for value reuse optimizations in memory operations require both the a...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
Memory encryption has so far often had too much overhead to be practical. If it were possible to red...
The storage for speculative values in superscalar processors is one of the main sources of complexit...
Today’s superscalar microprocessors use large, heavily-ported physical register files (RFs) to incre...
Register window is an architectural technique that reduces memory operations required to save and re...
The storage for speculative values in superscalar processors is one of the main sources of complexit...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
This paper introduces the notion of silent loads to classify load accesses that can be satisfied by ...
Execution efficiency of memory instructions remains critically important. To this end, a plethora of...
Execution efficiency of memory instructions remains critically important. To this end, a plethora of...
As multicore architectures have hit the mainstream, one of the challenges for future multicore desig...
Because they are based on large content-addressable memories, load-store queues (LSQs) present imple...
The considerable gap between processor and DRAM speed and the power losses in the cache hierarchy ca...
In modern architectures the register file is one of the most energy consuming and frequently used co...
The detection of opportunities for value reuse optimizations in memory operations require both the a...
The speed gap between processor and memory continues to limit performance. To address this problem, ...
Memory encryption has so far often had too much overhead to be practical. If it were possible to red...
The storage for speculative values in superscalar processors is one of the main sources of complexit...
Today’s superscalar microprocessors use large, heavily-ported physical register files (RFs) to incre...
Register window is an architectural technique that reduces memory operations required to save and re...
The storage for speculative values in superscalar processors is one of the main sources of complexit...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...