As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data tr...
The coming exascale era is a great opportunity for high performance computing (HPC) applications. Ho...
High Performance Computing (HPC) applications are always expanding in data size and computational co...
Microprocessors are increasingly used in a variety of applications from small handheld calculators t...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Due to improvements in high-performance computing (HPC) systems, researchers have created powerful a...
The rising count and shrinking feature size of transistors within modern computers is making them in...
International audienceMany methods are available to detect silent errors in high-performance computi...
International audienceMany methods are available to detect silent errors in high-performance computi...
Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Comput...
dissertationCurrent scaling trends in transistor technology, in pursuit of larger component counts a...
In the modern era of computing, processors are increasingly susceptible to soft errors. Current solu...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
textDependability and fault tolerance are important aspects of modern computer systems. Particle str...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
The coming exascale era is a great opportunity for high performance computing (HPC) applications. Ho...
High Performance Computing (HPC) applications are always expanding in data size and computational co...
Microprocessors are increasingly used in a variety of applications from small handheld calculators t...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Due to improvements in high-performance computing (HPC) systems, researchers have created powerful a...
The rising count and shrinking feature size of transistors within modern computers is making them in...
International audienceMany methods are available to detect silent errors in high-performance computi...
International audienceMany methods are available to detect silent errors in high-performance computi...
Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Comput...
dissertationCurrent scaling trends in transistor technology, in pursuit of larger component counts a...
In the modern era of computing, processors are increasingly susceptible to soft errors. Current solu...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
textDependability and fault tolerance are important aspects of modern computer systems. Particle str...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
The coming exascale era is a great opportunity for high performance computing (HPC) applications. Ho...
High Performance Computing (HPC) applications are always expanding in data size and computational co...
Microprocessors are increasingly used in a variety of applications from small handheld calculators t...