Several recent publications confirm that faults are common in high-performance computing systems. Therefore, further attention to the faults experienced by such computing sys-tems is warranted. In this paper, we present a study of DRAM and SRAM faults in large high-performance com-puting systems. Our goal is to understand the factors that influence faults in production settings. We examine the impact of aging on DRAM, finding a marked shift from permanent to transient faults in the first two years of DRAM lifetime. We examine the impact of DRAM vendor, finding that fault rates vary by more than 4x among vendors. We examine the physical location of faults in a DRAM device and in a data center; contrary to prior studies, we find no correlatio...
Abstract: High speed DRAMs today suffer from an increased sensitivity to interference and noise prob...
In recent years, embedded memories are the fastest growing segment of system on chip. They therefore...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
This paper summarizes our two-year study of corrected and uncor-rected errors on the MareNostrum 3 s...
Dynamic random access memory (DRAM) is the most widely used type of memory in the consumer market to...
For decades, main memory has enjoyed the continuous scaling of its physical substrate: DRAM (Dynamic...
Supercomputers offer new opportunities for scientific computing as they grow in size. However, their...
DRAM scaling has been the prime driver for increasing the capac-ity of main memory system over the p...
Abstract: DRAM testing has always been theoretically considered as a subset of general memory testin...
In this paper, we present a novel study on Data Retention Faults (DRFs) in SRAM memories. We analyze...
As process technology scales down to smaller dimensions, DRAM chips become more vulnerable to distur...
Abstract: Memory testing in general, and DRAM testing in particular, has become greatly dependent on...
[[abstract]]Fault analysis is an important step in establishing detailed fault models or subsequent ...
Abstract: High speed DRAMs today suffer from an increased sensitivity to interference and noise prob...
In recent years, embedded memories are the fastest growing segment of system on chip. They therefore...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...
Several recent publications have shown that hardware faults in the memory subsystem are commonplace....
<p>Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have sho...
This paper summarizes our two-year study of corrected and uncor-rected errors on the MareNostrum 3 s...
Dynamic random access memory (DRAM) is the most widely used type of memory in the consumer market to...
For decades, main memory has enjoyed the continuous scaling of its physical substrate: DRAM (Dynamic...
Supercomputers offer new opportunities for scientific computing as they grow in size. However, their...
DRAM scaling has been the prime driver for increasing the capac-ity of main memory system over the p...
Abstract: DRAM testing has always been theoretically considered as a subset of general memory testin...
In this paper, we present a novel study on Data Retention Faults (DRFs) in SRAM memories. We analyze...
As process technology scales down to smaller dimensions, DRAM chips become more vulnerable to distur...
Abstract: Memory testing in general, and DRAM testing in particular, has become greatly dependent on...
[[abstract]]Fault analysis is an important step in establishing detailed fault models or subsequent ...
Abstract: High speed DRAMs today suffer from an increased sensitivity to interference and noise prob...
In recent years, embedded memories are the fastest growing segment of system on chip. They therefore...
Aggressive process scaling and increasing demands of performance/cost efficiency have exacerbated th...