The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CP...
Fault-tolerance has become an essential concern for processor designers due to increasing soft-error...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
To improve performance and reduce power, processor designers employ advances that shrink feature siz...
Modern processors continue to aggressively scale down the feature size and reduce voltage levels to ...
Technology scaling has led to growing concerns about reliability in micro-processors. Currently, fau...
This article proposes a software error mitigation approach that uses the single instruction multiple...
As technology scales, the hardware reliability challenge affects a broad computing market, rendering...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
There is broad consensus among academic and industrial researchers in computer architecture that har...
With the advent of multicores, there is demand for monitoring parallelprograms running on multicores...
The negative impact of the aggressive scaling of technology nodes on the sensitivity of CMOS devices...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
Software-implemented fault injection (SWIFI) is an established experimental technique to evaluate th...
This article analyzes diverse criteria for effectively implementing selective hardening against soft...
Fault-tolerance has become an essential concern for processor designers due to increasing soft-error...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...
To improve performance and reduce power, processor designers employ advances that shrink feature siz...
Modern processors continue to aggressively scale down the feature size and reduce voltage levels to ...
Technology scaling has led to growing concerns about reliability in micro-processors. Currently, fau...
This article proposes a software error mitigation approach that uses the single instruction multiple...
As technology scales, the hardware reliability challenge affects a broad computing market, rendering...
As machines increase in scale, it is predicted that failure rates of supercomputers will correspondi...
There is broad consensus among academic and industrial researchers in computer architecture that har...
With the advent of multicores, there is demand for monitoring parallelprograms running on multicores...
The negative impact of the aggressive scaling of technology nodes on the sensitivity of CMOS devices...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
Software-implemented fault injection (SWIFI) is an established experimental technique to evaluate th...
This article analyzes diverse criteria for effectively implementing selective hardening against soft...
Fault-tolerance has become an essential concern for processor designers due to increasing soft-error...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to te...