Reliability is of great concern to the scalability of extreme-scale systems. Of particular concern are soft errors in main memory, which are a leading cause of failures on current systems and are predicted to be the leading cause on future systems. While great effort has gone into designing algorithms and applications that can continue to make progress in the presence of these errors without restarting, the most critical software running on a node, the operating system (OS), is currently left relatively unprotected. OS resiliency is of particular importance because, though this software typically represents a small footprint of a compute node's physical memory, recent studies show more memory errors in this region of memory than the remaind...
International audienceTo achieve a substantial reliability and safety level, it is imperative to pro...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Abstract: Soft errors are emerging with the ongoing reduction of structure sizes in current and futu...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Abstract—Memory errors are a major source of reliability problems in current computers. Undetected e...
International audienceThe computing continuum's actual trend is facing a growth in terms of devices ...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
Software vulnerabilities widely exist among various software from operating system kernel to web bro...
Continuous shrinking in feature size, increasing power density etc, increases the vulnerability of m...
This thesis investigates techniques for making closed loop control systems fault-tolerant and robust...
Run-time attacks have plagued computer systems for more than three decades, with control-flow hijack...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
The ever-increasing miniaturization of semiconductors has led to important advances in mobile, cloud...
Memory hardware reliability is an indispensable part of whole-system dependability. Its importance...
International audienceTo achieve a substantial reliability and safety level, it is imperative to pro...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Abstract: Soft errors are emerging with the ongoing reduction of structure sizes in current and futu...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Abstract—Memory errors are a major source of reliability problems in current computers. Undetected e...
International audienceThe computing continuum's actual trend is facing a growth in terms of devices ...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
Software vulnerabilities widely exist among various software from operating system kernel to web bro...
Continuous shrinking in feature size, increasing power density etc, increases the vulnerability of m...
This thesis investigates techniques for making closed loop control systems fault-tolerant and robust...
Run-time attacks have plagued computer systems for more than three decades, with control-flow hijack...
Traditionally, fault tolerance researchers have made very strict assumptions about program correctne...
The ever-increasing miniaturization of semiconductors has led to important advances in mobile, cloud...
Memory hardware reliability is an indispensable part of whole-system dependability. Its importance...
International audienceTo achieve a substantial reliability and safety level, it is imperative to pro...
Thesis (Ph. D.)--University of Rochester. Dept. of Electrical and Computer Engineering, 2012In moder...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...