On-chip caches maintain multiple pieces of metadata about each cached block—e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to implement and scalable, it necessitates a full tag store lookup for any metadata query—resulting in high latency and energy consumption. We Vnd that this approach is ineX-cient and inhibits several cache optimizations. In this work, we propose a new way of organizing the dirty bit information that enables simpler and more eXcient im-plementations of several optimizations. In our proposed ap-proach, we remove the dirty bits from the tag store and orga-nize it diUerently in a separate struc...
STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based...
The increasing use of microprocessor cores in embedded systems as well as mobile and portable device...
[[abstract]]Parallel accesses to the table lookaside buffer (TLB) and cache array are crucial for hi...
We characterize the cache behavior of an in-memory tag table and demonstrate that an optimized imple...
DRAM caches have shown excellent potential in capturing the spatial and temporal data locality of ap...
Hybrid main memories composed of DRAM as a cache to scalable non-volatile memories such as phase-cha...
DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stack...
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultan...
The predictability of memory access patterns in embedded systems can be successfully exploited to de...
Abstract—Recent research advocates large die-stacked DRAM caches in manycore servers to break the me...
© 2017 Association for Computing Machinery. Placing the DRAM in the same package as a processor enab...
We introduce a set of new Compression-Aware Management Policies (CAMP) for on-chip caches that emplo...
[EN] Power consumption in current high-performance chip multiprocessors (CMPs) has become a major de...
This article describes and evaluates a new approach to optimizing DRAM performance and energy consum...
We propose a novel energy-efficient memory architecture which relies on the use of cache with a redu...
STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based...
The increasing use of microprocessor cores in embedded systems as well as mobile and portable device...
[[abstract]]Parallel accesses to the table lookaside buffer (TLB) and cache array are crucial for hi...
We characterize the cache behavior of an in-memory tag table and demonstrate that an optimized imple...
DRAM caches have shown excellent potential in capturing the spatial and temporal data locality of ap...
Hybrid main memories composed of DRAM as a cache to scalable non-volatile memories such as phase-cha...
DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stack...
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultan...
The predictability of memory access patterns in embedded systems can be successfully exploited to de...
Abstract—Recent research advocates large die-stacked DRAM caches in manycore servers to break the me...
© 2017 Association for Computing Machinery. Placing the DRAM in the same package as a processor enab...
We introduce a set of new Compression-Aware Management Policies (CAMP) for on-chip caches that emplo...
[EN] Power consumption in current high-performance chip multiprocessors (CMPs) has become a major de...
This article describes and evaluates a new approach to optimizing DRAM performance and energy consum...
We propose a novel energy-efficient memory architecture which relies on the use of cache with a redu...
STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based...
The increasing use of microprocessor cores in embedded systems as well as mobile and portable device...
[[abstract]]Parallel accesses to the table lookaside buffer (TLB) and cache array are crucial for hi...