We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to implement a variety of continuous query (CQ) applications that require high-throughput, 24x7 operation. Examples include network monitoring, phone call processing, click-stream processing, and online financial analysis. Our main contribution is a scheme that carefully integrates tradi-tional query processing techniques for partitioned parallelism with the process-pairs approach for high availability. This delicate inte-gration allows us to tolerate failures of portions of a parallel dataflow without sacrificing result quality. Upon failure, our technique pro-vides qu...
Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. T...
The MapReduce programming model, due to its simplicity and scalability, has become an essential tool...
Current systems for data-parallel, incremental processing and view maintenance over high-rate stream...
This electronic version was submitted by the student author. The certified thesis is available in th...
It is argued that there is a significant class of pipelined large grain data flow computations whose...
Large-scale graph and machine learning analytics widely employ distributed iterative processing. Typ...
Today, many distributed systems are deployed in high-performance computing environments such as a mu...
Wide-area parallel processing systems will soon be available to researchers to solve a range of prob...
Real-world graph processing applications often require combining the graph data with tabular data. M...
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-t...
Numerous applications in for example science, engineering, and financial analysis increasingly requi...
Stream-processing systems are designed to support an emerging class of applications that require sop...
We present a collaborative, self-configuring high availability (HA) approach for stream processing t...
Wide-area parallel processing systems will soon be available to researchers to solve a range of prob...
Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. T...
The MapReduce programming model, due to its simplicity and scalability, has become an essential tool...
Current systems for data-parallel, incremental processing and view maintenance over high-rate stream...
This electronic version was submitted by the student author. The certified thesis is available in th...
It is argued that there is a significant class of pipelined large grain data flow computations whose...
Large-scale graph and machine learning analytics widely employ distributed iterative processing. Typ...
Today, many distributed systems are deployed in high-performance computing environments such as a mu...
Wide-area parallel processing systems will soon be available to researchers to solve a range of prob...
Real-world graph processing applications often require combining the graph data with tabular data. M...
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-t...
Numerous applications in for example science, engineering, and financial analysis increasingly requi...
Stream-processing systems are designed to support an emerging class of applications that require sop...
We present a collaborative, self-configuring high availability (HA) approach for stream processing t...
Wide-area parallel processing systems will soon be available to researchers to solve a range of prob...
Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. T...
The MapReduce programming model, due to its simplicity and scalability, has become an essential tool...
Current systems for data-parallel, incremental processing and view maintenance over high-rate stream...