Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Computación, 2018As HPC systems move towards extreme scale, soft errors leading to silent data corruptions become a major concern. In this thesis, we propose a set of three optimizations to the classical Redundant Multithreading (RMT) approach to allow faster soft error detection. First, we leverage the use of Simultaneous Multithreading (SMT) to collocate sibling replicated threads on the same physical core to efficiently exchange data to expose errors. Some HPC applications cannot fully exploit SMT for performance improvement and instead, we propose to use these additional resources for fault tolerance. Second, we present variable aggregati...
The rising count and shrinking feature size of transistors within modern computers is making them in...
In High Performance Computing (HPC) the demand for more performance is satisfied by increasing the n...
Continuous improvements in transistor scaling together with microarchitectural advances have made po...
International audienceMany methods are available to detect silent errors in high-performance computi...
International audienceMany methods are available to detect silent errors in high-performance computi...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher sof...
The coming exascale era is a great opportunity for high performance computing (HPC) applications. Ho...
The increasing computing capacity of multicore components like processors and graphics processing un...
Redundant multithreading (RMT) is an effective reliability solution that provides thread-level repli...
Transient faults are becoming a critical concern among current trends of design of generalpurpose mu...
Journal ArticleDue to shrinking transistor sizes and lower supply voltages, transient faults (soft e...
Journal ArticleRedundant multi-threading (RMT) has been proposed as an architectural approach that ...
The rising count and shrinking feature size of transistors within modern computers is making them in...
In High Performance Computing (HPC) the demand for more performance is satisfied by increasing the n...
Continuous improvements in transistor scaling together with microarchitectural advances have made po...
International audienceMany methods are available to detect silent errors in high-performance computi...
International audienceMany methods are available to detect silent errors in high-performance computi...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Resilient algorithms in high-performance computing are subject to rigorous non-functional constrain...
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher sof...
The coming exascale era is a great opportunity for high performance computing (HPC) applications. Ho...
The increasing computing capacity of multicore components like processors and graphics processing un...
Redundant multithreading (RMT) is an effective reliability solution that provides thread-level repli...
Transient faults are becoming a critical concern among current trends of design of generalpurpose mu...
Journal ArticleDue to shrinking transistor sizes and lower supply voltages, transient faults (soft e...
Journal ArticleRedundant multi-threading (RMT) has been proposed as an architectural approach that ...
The rising count and shrinking feature size of transistors within modern computers is making them in...
In High Performance Computing (HPC) the demand for more performance is satisfied by increasing the n...
Continuous improvements in transistor scaling together with microarchitectural advances have made po...