Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance into existing applications. Applications are modified to operate on encoded data and produce encoded results which may then be checked for correctness. An attractive feature of the scheme is that it requires little or no modification to the underlying hardware or system software. Previous algorithm-based methods for developing reliable versions of numerical programs for general-purpose muiticomputers have mostly concerned themselves with error detection. A truly fault-tolerant algorithm, however, needs to locate errors and recover from them once they have been located. In a parallel processing environment, this corresponds to locating the faulty...
Numerous algorithms for computationally intensive tasks that are suitable for execution on hypercube...
In Algorithm-based fault tolerance (ABFT), fault tolerance is tailored to the algorithm performed. M...
The scale of parallel computing systems is rapidly approaching dimensions where fault tolerance can...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for...
163 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1985.The concept of algorithm-base...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation ...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Due to the character of the original source materials and the nature of batch digitization, quality ...
With few exceptions, the two issues of algorithm design and fault tolerance for processor arrays hav...
An important consideration in the design of high performance multiprocessor systems is to ensure the...
jnaltman informatikuni erlangende Ecient parallel algorithms proposed to solve many fundamental prob...
An important consideration in the design of high performance multiprocessor systems is to ensure the...
Numerous algorithms for computationally intensive tasks that are suitable for execution on hypercube...
In Algorithm-based fault tolerance (ABFT), fault tolerance is tailored to the algorithm performed. M...
The scale of parallel computing systems is rapidly approaching dimensions where fault tolerance can...
Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance int...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for...
163 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1985.The concept of algorithm-base...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation ...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Due to the character of the original source materials and the nature of batch digitization, quality ...
With few exceptions, the two issues of algorithm design and fault tolerance for processor arrays hav...
An important consideration in the design of high performance multiprocessor systems is to ensure the...
jnaltman informatikuni erlangende Ecient parallel algorithms proposed to solve many fundamental prob...
An important consideration in the design of high performance multiprocessor systems is to ensure the...
Numerous algorithms for computationally intensive tasks that are suitable for execution on hypercube...
In Algorithm-based fault tolerance (ABFT), fault tolerance is tailored to the algorithm performed. M...
The scale of parallel computing systems is rapidly approaching dimensions where fault tolerance can...