We present an overview of massively parallel deterministic algorithms which combine high fault-tolerance and efficiency. This desirable combination (called robustness here) is nontrivial, since increasing efficiency implies removing re-dundancy whereas increasing fault-tolerance requires adding redundancy to computations. We study a spectrum of algorithmic models for which signif-icant robustness is achievable, from static fault, synchronous computation to dynamic fault, asynchronous computation. In addition to fail-stop processor models, we examine and deal with arbitrarily initialized memory and restricted memory access concurrency. We survey the deterministic upper bounds for the 4 ýV basic Write-All primitive, the lower bounds on its ef...
Robust computation---a radical approach to fault-tolerant database access---was explicitly defined o...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
Robustness is a fundamental and timeless issue, and it remains vital to all aspects of computation s...
Abstract. Algorithms in synchronous parallel models of computation with processor crashes can be mad...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
In this paper we present an efficient general simulation strategy for computations designed for full...
Some of today’s applications run on computer platforms with large and inexpensive memories, which ar...
The difficulty of designing fault-tolerant distributed algorithms increases with the severity of fa...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
With few exceptions, the two issues of algorithm design and fault tolerance for processor arrays hav...
© 2018 Association for Computing Machinery. We consider a parallel computational model, the Parallel...
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
12 pagesInternational audienceWe investigate the coded model of fault-tolerant computations introduc...
In their SIAM J. on Computing paper [33] from 1992, Martel et al. posed a question for developing a ...
Robust computation---a radical approach to fault-tolerant database access---was explicitly defined o...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
Robustness is a fundamental and timeless issue, and it remains vital to all aspects of computation s...
Abstract. Algorithms in synchronous parallel models of computation with processor crashes can be mad...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
In this paper we present an efficient general simulation strategy for computations designed for full...
Some of today’s applications run on computer platforms with large and inexpensive memories, which ar...
The difficulty of designing fault-tolerant distributed algorithms increases with the severity of fa...
Some of the present day applications run on computer platforms with large and inexpensive memories, ...
With few exceptions, the two issues of algorithm design and fault tolerance for processor arrays hav...
© 2018 Association for Computing Machinery. We consider a parallel computational model, the Parallel...
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
12 pagesInternational audienceWe investigate the coded model of fault-tolerant computations introduc...
In their SIAM J. on Computing paper [33] from 1992, Martel et al. posed a question for developing a ...
Robust computation---a radical approach to fault-tolerant database access---was explicitly defined o...
Energy increasingly constrains modern computer hardware, yet protecting computations and data agains...
Robustness is a fundamental and timeless issue, and it remains vital to all aspects of computation s...