Cluster-based data-parallel frameworks such as MapReduce, Hadoop, and Dryad are increasingly popular for a large class of compute-intensive tasks. Such systems are designed for large-scale clusters, and employ several techniques to decrease the run time of jobs in the presence of failures, slow machines, and other effects. In this paper, we apply Dryad to smaller-scale, “ad-hoc” clusters such as those formed by aggregating the servers and workstations in a small office. We first show that, while Dryad’s greedy scheduling algorithm performs well at scale, it is significantly less optimal in a small (5-10 machine) cluster environment where nodes have widely differing performance characteristics. We further show that in such cases, performance...
The arrival of multicore architectures has generated an interest in reformulating dense matrix compu...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is th...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decision...
In recent years there has been an extraordinary growth of large-scale data processing and related te...
Big Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organi...
MapReduce is emerging as an important programming model for large-scale data-parallel applications s...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
Extensive data analysis has become the enabler for diagnostics and decision making in many modern sy...
MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed env...
Abstract—This paper develops new schedulability bounds for a simplified MapReduce workflow model. Ma...
Within this paper, we target at one subset of production MapReduce workloads that contain some indep...
Data intensive computing holds the promise of major scientific breakthroughs and discoveries from th...
Nowadays, analyzing large amount of data is of paramount importance for many companies. Big data and...
Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications....
The success of modern applications depends on the insights they collect from their data repositories...
The arrival of multicore architectures has generated an interest in reformulating dense matrix compu...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is th...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decision...
In recent years there has been an extraordinary growth of large-scale data processing and related te...
Big Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organi...
MapReduce is emerging as an important programming model for large-scale data-parallel applications s...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
Extensive data analysis has become the enabler for diagnostics and decision making in many modern sy...
MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed env...
Abstract—This paper develops new schedulability bounds for a simplified MapReduce workflow model. Ma...
Within this paper, we target at one subset of production MapReduce workloads that contain some indep...
Data intensive computing holds the promise of major scientific breakthroughs and discoveries from th...
Nowadays, analyzing large amount of data is of paramount importance for many companies. Big data and...
Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications....
The success of modern applications depends on the insights they collect from their data repositories...
The arrival of multicore architectures has generated an interest in reformulating dense matrix compu...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is th...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decision...