In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of Map Reduce, in which map or reduce jobs are greedily assigned to workers, and failed jobs are rerun on other workers. Osprey is implemented using a middleware approach, with only a small amount of custom code to handle cluster coordination. Each node in the system is a discrete database system running on a sepa...
It is argued that there is a significant class of pipelined large grain data flow computations whose...
Replication of data at more than one site in a distributed database has been reported to increase th...
A distributed database system is subject to site failure and link failure. This paper presents a rea...
Fault tolerance is very essential fordistributed database in a client serverenvironment. In that dis...
Cooperative management of data is a difficult challenge. In the absence of a central authority, ther...
Abstract. Designing and programming dependable distributed applications is very difficult. Databases...
Cooperative management of data is a difficult challenge. In the absence of a central authority, ther...
Because of the high cost and impracticality of a high connectivity network, most recent research in ...
textDistributed systems are rapidly increasing in importance due to the need for scalable computatio...
Abstract—Over the last 2-3 years, the importance of data-intensive computing has increasingly been r...
This paper introduces a generic technique to obtain a shared-storage database cluster from an off-th...
A distributed database system is subject to site failure and link failure. This paper presents a rea...
This paper presents an integrated concurrency and recovery algorithm. Strict timestamp ordering was ...
This dissertation investigates the problem of supporting optimistic processing for distributed datab...
We propose a new replication control scheme for multiple-copy consistency in mobile distributed data...
It is argued that there is a significant class of pipelined large grain data flow computations whose...
Replication of data at more than one site in a distributed database has been reported to increase th...
A distributed database system is subject to site failure and link failure. This paper presents a rea...
Fault tolerance is very essential fordistributed database in a client serverenvironment. In that dis...
Cooperative management of data is a difficult challenge. In the absence of a central authority, ther...
Abstract. Designing and programming dependable distributed applications is very difficult. Databases...
Cooperative management of data is a difficult challenge. In the absence of a central authority, ther...
Because of the high cost and impracticality of a high connectivity network, most recent research in ...
textDistributed systems are rapidly increasing in importance due to the need for scalable computatio...
Abstract—Over the last 2-3 years, the importance of data-intensive computing has increasingly been r...
This paper introduces a generic technique to obtain a shared-storage database cluster from an off-th...
A distributed database system is subject to site failure and link failure. This paper presents a rea...
This paper presents an integrated concurrency and recovery algorithm. Strict timestamp ordering was ...
This dissertation investigates the problem of supporting optimistic processing for distributed datab...
We propose a new replication control scheme for multiple-copy consistency in mobile distributed data...
It is argued that there is a significant class of pipelined large grain data flow computations whose...
Replication of data at more than one site in a distributed database has been reported to increase th...
A distributed database system is subject to site failure and link failure. This paper presents a rea...