The problems of software debugging and system reliability/availability are among the most challenging problems the computing industry is facing today, with direct impact on the development and operating costs of computing systems. A promising debugging technique that assists programmers identify and fix the causes of software bugs a lot more efficiently is bidirectional debugging, which enables the user to execute the program in "reverse", and a typical method used to recover a system after a fault is backwards error recovery, which restores the system to the last error-free state. Both reverse execution and backwards error recovery are enabled by creating memory checkpoints, which are used to restore the program/system to a prior point in ...
Technology scaling and a continual increase in operating frequency have been the main driver of proc...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
The goal of this thesis is to analyze the post-silicon validation hardware infrastructure implemente...
Full system reliability is a problem that spans multiple levels of the software/hardware stack. The...
With rapid growth in computer hardware technologies and architectures, software programs have become...
This thesis contributes to the area of hardware support for parallel programming by introducing new ...
Memory system design is important for providing high reliability and availability. This dissertation...
Today’s supercomputers are built from the state-of-the-art components to extract as much performance...
It is a great challenge to build reliable computer systems with unreliable hardware and buggy softwa...
International audienceReversible computing allows one to run programs not only in the usual forward ...
Recent years have seen a dramatic increase in the use of hardware accelerators to perform machine le...
Concurrent programs are ubiquitous, from the high-end servers to personal machines, due to the fact ...
This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing...
As we move to large manycores, the hardware-based global checkpointing schemes that have been propo...
As technology feature size continues to shrink, we see two challenging problems in designing compute...
Technology scaling and a continual increase in operating frequency have been the main driver of proc...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
The goal of this thesis is to analyze the post-silicon validation hardware infrastructure implemente...
Full system reliability is a problem that spans multiple levels of the software/hardware stack. The...
With rapid growth in computer hardware technologies and architectures, software programs have become...
This thesis contributes to the area of hardware support for parallel programming by introducing new ...
Memory system design is important for providing high reliability and availability. This dissertation...
Today’s supercomputers are built from the state-of-the-art components to extract as much performance...
It is a great challenge to build reliable computer systems with unreliable hardware and buggy softwa...
International audienceReversible computing allows one to run programs not only in the usual forward ...
Recent years have seen a dramatic increase in the use of hardware accelerators to perform machine le...
Concurrent programs are ubiquitous, from the high-end servers to personal machines, due to the fact ...
This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing...
As we move to large manycores, the hardware-based global checkpointing schemes that have been propo...
As technology feature size continues to shrink, we see two challenging problems in designing compute...
Technology scaling and a continual increase in operating frequency have been the main driver of proc...
Checkpoint is defined as a designated place in a program at which normal processing is interrupted s...
The goal of this thesis is to analyze the post-silicon validation hardware infrastructure implemente...