Distributed graph processing systems largely rely on proactive techniques for failure recovery. Unfortunately, these approaches (such as checkpointing) entail a significant overhead. In this paper, we argue that distributed graph processing systems should instead use a reactive approach to failure recovery. The reactive approach trades off completeness of the result (generating a slightly inaccurate result) while reducing the overhead during failure-free execution to zero. We build a system called Zorro that imbues this reactive approach, and integrate Zorro into two graph processing systems – PowerGraph and LFGraph. When a failure occurs, Zorro opportunistically exploits vertex replication (inherent in today’s graph processing systems) to ...
We have addressed the complex problem of recovery for concurrent failures in distributed computing e...
Applicative systems are promising candidates for achieving high performance computing through aggreg...
In this work we have addressed the complex problem of recovery for concurrent failures in a distribu...
Distributed graph processing systems largely rely on proac-tive techniques for failure recovery. Unf...
Distributed graph processing systems largely rely on proac-tive techniques for failure recovery. Unf...
Distributed graph processing frameworks have become increasingly popular for processing large graphs...
Distributed graph processing frameworks have become increasingly popular for processing large graphs...
Distributed graph processing systems are an emerging area of big data systems. As graphs continue to...
Distributed graph processing systems increasingly require many compute nodes to cope with the requir...
Real-world graph processing applications often require combining the graph data with tabular data. M...
While various iterative graph algorithms can be expressed via asynchronous parallelism, lack of its ...
In contrast to conventional (trans)action concepts, the proposed dynamic action model includes the p...
Fault-tolerance protocols play an important role in today long runtime scienti\ufb01c parallel appli...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
Large-scale graph and machine learning analytics widely employ distributed iterative processing. Typ...
We have addressed the complex problem of recovery for concurrent failures in distributed computing e...
Applicative systems are promising candidates for achieving high performance computing through aggreg...
In this work we have addressed the complex problem of recovery for concurrent failures in a distribu...
Distributed graph processing systems largely rely on proac-tive techniques for failure recovery. Unf...
Distributed graph processing systems largely rely on proac-tive techniques for failure recovery. Unf...
Distributed graph processing frameworks have become increasingly popular for processing large graphs...
Distributed graph processing frameworks have become increasingly popular for processing large graphs...
Distributed graph processing systems are an emerging area of big data systems. As graphs continue to...
Distributed graph processing systems increasingly require many compute nodes to cope with the requir...
Real-world graph processing applications often require combining the graph data with tabular data. M...
While various iterative graph algorithms can be expressed via asynchronous parallelism, lack of its ...
In contrast to conventional (trans)action concepts, the proposed dynamic action model includes the p...
Fault-tolerance protocols play an important role in today long runtime scienti\ufb01c parallel appli...
Checkpointing in a distributed system is essential for recovery to a globally consistent state after...
Large-scale graph and machine learning analytics widely employ distributed iterative processing. Typ...
We have addressed the complex problem of recovery for concurrent failures in distributed computing e...
Applicative systems are promising candidates for achieving high performance computing through aggreg...
In this work we have addressed the complex problem of recovery for concurrent failures in a distribu...