AbstractModeling workflow performance is crucial for finding optimal configuration parameters and optimizing execution times. We apply the method of surrogate-based modeling to performance tuning of MapReduce jobs. We build a surrogate model defined by a multivariate polynomial containing a variable for each parameter to be tuned. For illustrative purposes, we focus on just two parameters: the number of parallel mappers and the number of parallel reducers. We demonstrate that an accurate performance model can be built sampling a small set of the parameter space. We compare the accuracy and cost of building the model when using different sampling methods as well as when using different modeling approaches. We conclude that the surrogate-base...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
A common simplification made when modeling the performance of a parallel program is the assumption t...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
MapReduce framework has become the state-of-the-art paradigm for large-scale data processing. In our...
Abstract—Performance models have profound impact on hardware-software codesign, architectural explor...
MapReduce job parameter tuning is a daunting and time consum-ing task. The parameter configuration s...
Several companies are increasingly using MapReduce for efficient large scale data processing such as...
New approaches are necessary to generate performance models in current systems due the het erogeneit...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
<p>Pi application needs the smallest sampling time. The remaining applications need the similar mini...
Abstract—MapReduce is a highly acclaimed programming paradigm for large-scale information processing...
Nowadays MapReduce and its open source implementation, Apache Hadoop, are the most widespread soluti...
Building parameterized performance models of applications in an automatic way is difficult because o...
irected Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-base...
The discussion context of this paper is big data processing of MapReduce by volunteer computing in d...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
A common simplification made when modeling the performance of a parallel program is the assumption t...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
MapReduce framework has become the state-of-the-art paradigm for large-scale data processing. In our...
Abstract—Performance models have profound impact on hardware-software codesign, architectural explor...
MapReduce job parameter tuning is a daunting and time consum-ing task. The parameter configuration s...
Several companies are increasingly using MapReduce for efficient large scale data processing such as...
New approaches are necessary to generate performance models in current systems due the het erogeneit...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
<p>Pi application needs the smallest sampling time. The remaining applications need the similar mini...
Abstract—MapReduce is a highly acclaimed programming paradigm for large-scale information processing...
Nowadays MapReduce and its open source implementation, Apache Hadoop, are the most widespread soluti...
Building parameterized performance models of applications in an automatic way is difficult because o...
irected Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-base...
The discussion context of this paper is big data processing of MapReduce by volunteer computing in d...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
A common simplification made when modeling the performance of a parallel program is the assumption t...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...