Over the past decade, the demand for real time processing of huge amount of streaming data has emerged and grown rapidly. Apache Storm, Apache Flink, Samza and many other stream processing frameworks have been proposed and implemented to meet this need. Although lots of effort has been made to reduce the average latency of stream processing systems, how to shorten their tail latency has received little attention. This thesis presents a series of novel techniques for reducing the tail latency in stream processing systems like Apache Storm. Concretely, we present three mechanisms: (1) adaptive timeout coupled with selective replay to catch straggler tuples; (2) shared queues among different tasks of the same operator to reduce overall queuei...
Bachelor's Thesis at TU Berlin's Telecommunication Networks Group headed by Prof. Adam Wolisz. Abst...
Real-time analysis of continuous data streams using distributed systems is an emerging class of dat...
We are undeniably living in the era of big data, where people and machines generate information at a...
Over the past decade, the demand for real time processing of huge amount of streaming data has emerg...
Present-day computing systems have to deal with a continuous growth of data rate and volume. Process...
Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in th...
This paper describes a benchmark for stream processing frameworks allowing accurate latency benchmar...
The recent advancements in stream processing systems enabled applications to exploit fast-changing d...
As data permeates all disciplines, the role of big data becomes increasingly important. Sensors, IoT...
With the upswing in the volume of data, information online, and magnanimous cloud applications, big ...
With the upswing in the volume of data, information online, and magnanimous cloud applications, big ...
International audienceDistributed stream processing systems are today gaining momentum as a tool to ...
International audienceLoad shedding is a technique employed by stream processing systems to handle u...
The need for scalable and efficient stream analysis has led to the development of many open-source s...
Large Internet companies like Facebook, Amazon, and Twitter are increasingly recognizing the value o...
Bachelor's Thesis at TU Berlin's Telecommunication Networks Group headed by Prof. Adam Wolisz. Abst...
Real-time analysis of continuous data streams using distributed systems is an emerging class of dat...
We are undeniably living in the era of big data, where people and machines generate information at a...
Over the past decade, the demand for real time processing of huge amount of streaming data has emerg...
Present-day computing systems have to deal with a continuous growth of data rate and volume. Process...
Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in th...
This paper describes a benchmark for stream processing frameworks allowing accurate latency benchmar...
The recent advancements in stream processing systems enabled applications to exploit fast-changing d...
As data permeates all disciplines, the role of big data becomes increasingly important. Sensors, IoT...
With the upswing in the volume of data, information online, and magnanimous cloud applications, big ...
With the upswing in the volume of data, information online, and magnanimous cloud applications, big ...
International audienceDistributed stream processing systems are today gaining momentum as a tool to ...
International audienceLoad shedding is a technique employed by stream processing systems to handle u...
The need for scalable and efficient stream analysis has led to the development of many open-source s...
Large Internet companies like Facebook, Amazon, and Twitter are increasingly recognizing the value o...
Bachelor's Thesis at TU Berlin's Telecommunication Networks Group headed by Prof. Adam Wolisz. Abst...
Real-time analysis of continuous data streams using distributed systems is an emerging class of dat...
We are undeniably living in the era of big data, where people and machines generate information at a...