An application may have different sensitivity to faults in different subsets of the data it uses. Some data regions may therefore be more critical than others. Capitalizing on this observation, Odd-ECC provides a mechanism to dynamically select the memory fault tolerance of each allocated page of a program on demand depending on the criticality of the respective data. Odd-ECC error correcting codes (ECCs) are stored in separate physical pages and hidden by the OS as pages unavailable to the user. Still, these ECCs are physically aligned with the data they protect so the memory controller can efficiently access them. Thereby, capacity, performance and energy overheads of memory fault tolerance are proportional to the criticality of the data ...
Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. e...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
In this talk we investigate a number of on-chip coding techniques for the protection of Random Acce...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power a...
Servers and HPC systems often use a strong memory error correction code, or ECC, to meet their relia...
Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server a...
Because main memory is vulnerable to errors and failures, large-scale systems and critical servers u...
textFuture computing platforms will increasingly demand more stringent memory resiliency mechanisms ...
textFuture computing platforms will increasingly demand more stringent memory resiliency mechanisms ...
Growing computer system sizes and levels of integration have made memory reliability a primary conce...
pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chip...
Abstract–Post-silicon healing techniques that rely on built-in redundancy (e.g. row/column redundanc...
Memory protection is necessary to ensure the correctness of data in the presence of unavoidable faul...
Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. e...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
In this talk we investigate a number of on-chip coding techniques for the protection of Random Acce...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
Continued scaling of DRAM technologies induces more faulty DRAM cells than before. These inherent fa...
Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power a...
Servers and HPC systems often use a strong memory error correction code, or ECC, to meet their relia...
Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server a...
Because main memory is vulnerable to errors and failures, large-scale systems and critical servers u...
textFuture computing platforms will increasingly demand more stringent memory resiliency mechanisms ...
textFuture computing platforms will increasingly demand more stringent memory resiliency mechanisms ...
Growing computer system sizes and levels of integration have made memory reliability a primary conce...
pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chip...
Abstract–Post-silicon healing techniques that rely on built-in redundancy (e.g. row/column redundanc...
Memory protection is necessary to ensure the correctness of data in the presence of unavoidable faul...
Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. e...
Memory reliability has been a major design constraint for mission-critical and large-scale systems f...
In this talk we investigate a number of on-chip coding techniques for the protection of Random Acce...