To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and ov...
Today’s big data clusters based on the MapReduce paradigm are capable of executing analysis jobs wit...
We are in the computing era of super-zetta data bytes (a.k.a. Big Data). Big Data is critical to dev...
Big Data analytics is increasingly performed using the MapReduce paradigm and its open-source implem...
Abstract—MapReduce is a highly acclaimed programming paradigm for large-scale information processing...
MapReduce framework has become the state-of-the-art paradigm for large-scale data processing. In our...
Research has shown that approximate computing is effective at reducing the resource requirements, co...
We tackle the problem of predicting the performance of MapReduce applications designing accurate pro...
Today, most modern online services make use of big data analytics systems to extract useful informat...
For various types of enterprise and scientific applications as well as cyber-physical systems (such ...
Big data and its analysis are in the focus of current era. The volume of data production is tremendo...
MapReduce has become the standard model for supporting big data analytics. In particular, MapReduce ...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is th...
The discussion context of this paper is big data processing of MapReduce by volunteer computing in d...
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging fro...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decision...
Today’s big data clusters based on the MapReduce paradigm are capable of executing analysis jobs wit...
We are in the computing era of super-zetta data bytes (a.k.a. Big Data). Big Data is critical to dev...
Big Data analytics is increasingly performed using the MapReduce paradigm and its open-source implem...
Abstract—MapReduce is a highly acclaimed programming paradigm for large-scale information processing...
MapReduce framework has become the state-of-the-art paradigm for large-scale data processing. In our...
Research has shown that approximate computing is effective at reducing the resource requirements, co...
We tackle the problem of predicting the performance of MapReduce applications designing accurate pro...
Today, most modern online services make use of big data analytics systems to extract useful informat...
For various types of enterprise and scientific applications as well as cyber-physical systems (such ...
Big data and its analysis are in the focus of current era. The volume of data production is tremendo...
MapReduce has become the standard model for supporting big data analytics. In particular, MapReduce ...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is th...
The discussion context of this paper is big data processing of MapReduce by volunteer computing in d...
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging fro...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decision...
Today’s big data clusters based on the MapReduce paradigm are capable of executing analysis jobs wit...
We are in the computing era of super-zetta data bytes (a.k.a. Big Data). Big Data is critical to dev...
Big Data analytics is increasingly performed using the MapReduce paradigm and its open-source implem...