Recent studies estimate that server cost contributes to as much as 57 % of the total cost of ownership (TCO) of a data-center [1]. One key contributor to this high server cost is the procurement of memory devices such as DRAMs, especially for data-intensive datacenter cloud applications that need low latency (such as web search, in-memory caching, and graph traversal). Such memory devices, however, may be prone to hardware errors that occur due to unintended bit flips during device operation [40, 33, 41, 20]. To protect against such er-rors, traditional systems uniformly employ devices with high-quality chips and error correction techniques, both of which increase device cost. At the same time, we make the obser-vations that 1) data-intensi...
The key objective of database systems is to reliably manage data, whereby high query throughput and ...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
One of the main causes of hardware failure in large-scale clusters is an uncorrected error in main ...
<p>Memory devices represent a key component of datacenter total cost of ownership (TCO), and techniq...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
The memory hierarchy is predicted to consume up to 40% to 70% of total system power in future data c...
With a need to deliver highest quality products operating in all environments, cope with small and u...
DRAM scaling has been the prime driver for increasing the capac-ity of main memory system over the p...
Emerging workloads in cloud and data center infrastructures demand high main memory bandwidth and ca...
textFuture computing platforms will increasingly demand more stringent memory resiliency mechanisms ...
As device technologies scale in the nanometer era, the current off-chip DRAM technologies are very c...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chip...
Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better perfo...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
The key objective of database systems is to reliably manage data, whereby high query throughput and ...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
One of the main causes of hardware failure in large-scale clusters is an uncorrected error in main ...
<p>Memory devices represent a key component of datacenter total cost of ownership (TCO), and techniq...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
The memory hierarchy is predicted to consume up to 40% to 70% of total system power in future data c...
With a need to deliver highest quality products operating in all environments, cope with small and u...
DRAM scaling has been the prime driver for increasing the capac-ity of main memory system over the p...
Emerging workloads in cloud and data center infrastructures demand high main memory bandwidth and ca...
textFuture computing platforms will increasingly demand more stringent memory resiliency mechanisms ...
As device technologies scale in the nanometer era, the current off-chip DRAM technologies are very c...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chip...
Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better perfo...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
The key objective of database systems is to reliably manage data, whereby high query throughput and ...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
One of the main causes of hardware failure in large-scale clusters is an uncorrected error in main ...