We consider the problem of performing t tasks in a distributed system of p faultprone processors. This problem, called do-all herein, was introduced by Dwork, Halpern and Waarts. Our work deals with a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f ! p stop-failures and it does not allow restarts. It has available processor steps (work) complexity S = O((t + p log p= log log p) log f) and message complexity M = O(t + p log p= log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encou...
A collection of protocols to facilitate detection of the termination of a computation on a distribu...
It is difficult to design and verify distributed programs that execute correctly despite transient ...
This paper presents a new checkpointing coordination scheme which utilizes the communication pattern...
We consider the problem of performing t tasks in a distributed system of p faultprone processors. Th...
This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task ...
AbstractDo-All is the abstract problem of using n processors to cooperatively perform m independent ...
Abstract. A fundamental problem in distributed computing is performing a set of tasks despite failur...
Abstract. The ability to cooperate on common tasks in a dis-tributed setting is key to solving a bro...
The problem of performing t tasks in a distributed system on p failure-prone processors i one of the...
The ability to cooperatively perform a collection of tasks in a distributed setting is key to solvin...
AbstractThe Do-All problem is about scheduling t similar and independent tasks to be performed by p ...
Often hard real-time systems require results that are produced on time despite the occurrence of pro...
AbstractThis paper considers the problem of performing tasks in asynchronous distributed settings. T...
This paper presents a new checkpointing algorithm for systems using reliable communication channels....
The fault-tolerance of distributed algorithms is investigated in asynchronous message passing system...
A collection of protocols to facilitate detection of the termination of a computation on a distribu...
It is difficult to design and verify distributed programs that execute correctly despite transient ...
This paper presents a new checkpointing coordination scheme which utilizes the communication pattern...
We consider the problem of performing t tasks in a distributed system of p faultprone processors. Th...
This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task ...
AbstractDo-All is the abstract problem of using n processors to cooperatively perform m independent ...
Abstract. A fundamental problem in distributed computing is performing a set of tasks despite failur...
Abstract. The ability to cooperate on common tasks in a dis-tributed setting is key to solving a bro...
The problem of performing t tasks in a distributed system on p failure-prone processors i one of the...
The ability to cooperatively perform a collection of tasks in a distributed setting is key to solvin...
AbstractThe Do-All problem is about scheduling t similar and independent tasks to be performed by p ...
Often hard real-time systems require results that are produced on time despite the occurrence of pro...
AbstractThis paper considers the problem of performing tasks in asynchronous distributed settings. T...
This paper presents a new checkpointing algorithm for systems using reliable communication channels....
The fault-tolerance of distributed algorithms is investigated in asynchronous message passing system...
A collection of protocols to facilitate detection of the termination of a computation on a distribu...
It is difficult to design and verify distributed programs that execute correctly despite transient ...
This paper presents a new checkpointing coordination scheme which utilizes the communication pattern...