As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data tr...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Due to improvements in high-performance computing (HPC) systems, researchers have created powerful a...
Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated th...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
Transient hardware faults have become one of the major concerns affecting the reliability of modern ...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
HPC systems are widely used in industrial, economical, and scientific applications, and many of thes...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
The strict power efficiency constraints required to achieve exascale systems will dramatically incre...
As late-CMOS process scaling leads to increasingly variable circuits/logic and as most post-CMOS tec...
In the modern era of computing, processors are increasingly susceptible to soft errors. Current solu...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Due to improvements in high-performance computing (HPC) systems, researchers have created powerful a...
Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated th...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
Transient hardware faults have become one of the major concerns affecting the reliability of modern ...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
HPC systems are widely used in industrial, economical, and scientific applications, and many of thes...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
The strict power efficiency constraints required to achieve exascale systems will dramatically incre...
As late-CMOS process scaling leads to increasingly variable circuits/logic and as most post-CMOS tec...
In the modern era of computing, processors are increasingly susceptible to soft errors. Current solu...
Technology and voltage scaling is making integrated circuits increasingly susceptible to failures ca...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...