Redundant multithreading (RMT) is an effective reliability solution that provides thread-level replication; however, it imposes additional overheads in terms of performance loss or energy consumption. Partial-RMT is an alternative solution that provides partial redundancy of an executing thread to reduce such overheads while trading off full coverage from faults. In this study, we propose a software-level RMT approach that offers lightweight replication of partial code regions within the same application process. Our software-level RMT approach is particularly suitable for applications with varying code criticality, where we determine the critical code regions by performing a fault injection campaign in addition to execution time profile an...
Redundant threading architectures duplicate all instructions to detect and possibly recover from tra...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Comput...
Journal ArticleRedundant multi-threading (RMT) has been proposed as an architectural approach that ...
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher sof...
Journal ArticleNoise and radiation-induced soft errors (transient faults) in computer systems have i...
Journal ArticleDue to shrinking transistor sizes and lower supply voltages, transient faults (soft e...
Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, ...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
In this work we propose partial task replication and check-pointing for task-parallel HPC applicatio...
Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Perfo...
As the scale of supercomputers grows, it is becoming increasingly important for software to efficien...
Exponential growth in. the number of on-chip transistors, coupled with reductions in voltage levels,...
The microprocessor industry is rapidly moving to the use of multicore chips as general-purpose proce...
Redundant threading architectures duplicate all instructions to detect and possibly recover from tra...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Comput...
Journal ArticleRedundant multi-threading (RMT) has been proposed as an architectural approach that ...
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher sof...
Journal ArticleNoise and radiation-induced soft errors (transient faults) in computer systems have i...
Journal ArticleDue to shrinking transistor sizes and lower supply voltages, transient faults (soft e...
Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, ...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
In this work we propose partial task replication and check-pointing for task-parallel HPC applicatio...
Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Perfo...
As the scale of supercomputers grows, it is becoming increasingly important for software to efficien...
Exponential growth in. the number of on-chip transistors, coupled with reductions in voltage levels,...
The microprocessor industry is rapidly moving to the use of multicore chips as general-purpose proce...
Redundant threading architectures duplicate all instructions to detect and possibly recover from tra...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Comput...