DRAM scaling has been the prime driver for increasing the capac-ity of main memory system over the past three decades. Unfor-tunately, scaling DRAM to smaller technology nodes has become challenging due to the inherent difficulty in designing smaller ge-ometries, coupled with the problems of device variation and leak-age. Future DRAM devices are likely to experience significantly high error-rates. Techniques that can tolerate errors efficiently can enable DRAM to scale to smaller technology nodes. However, ex-isting techniques such as row/column sparing and ECC become prohibitive at high error-rates. To develop cost-effective solutions for tolerating high error-rates, this paper advocates a cross-layer approach. Rather than hiding the fault...
The problem of soft errors caused by radiation events are expected to get worse with technology scal...
Abstract — Today's DRAM process is expected to continue scaling, enabling minimum feature sizes...
Several recent publications confirm that faults are common in high-performance computing systems. Th...
For decades, main memory has enjoyed the continuous scaling of its physical substrate: DRAM (Dynamic...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server a...
With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has beco...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
DoctorReliability of a memory subsystem is one of the most important feature to computer system stab...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...
Recent studies estimate that server cost contributes to as much as 57 % of the total cost of ownersh...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Supercomputers offer new opportunities for scientific computing as they grow in size. However, their...
An application may have different sensitivity to faults in different subsets of the data it uses. So...
The problem of soft errors caused by radiation events are expected to get worse with technology scal...
Abstract — Today's DRAM process is expected to continue scaling, enabling minimum feature sizes...
Several recent publications confirm that faults are common in high-performance computing systems. Th...
For decades, main memory has enjoyed the continuous scaling of its physical substrate: DRAM (Dynamic...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server a...
With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has beco...
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
DoctorReliability of a memory subsystem is one of the most important feature to computer system stab...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...
Recent studies estimate that server cost contributes to as much as 57 % of the total cost of ownersh...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
Supercomputers offer new opportunities for scientific computing as they grow in size. However, their...
An application may have different sensitivity to faults in different subsets of the data it uses. So...
The problem of soft errors caused by radiation events are expected to get worse with technology scal...
Abstract — Today's DRAM process is expected to continue scaling, enabling minimum feature sizes...
Several recent publications confirm that faults are common in high-performance computing systems. Th...