This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a work-conserving task preemption mechanism. We incorporated Natjam into the Hadoop YARN scheduler framework (in Hadoop 0.23). We present experiments from deployments on a test cluster, Emulab and a Yahoo! commercial cluster, using both synthetic traces as well as Hadoop cluster traces we obtained from Yahoo!. Our results reveal that Natjam incurs overheads of under 7%. Under real Hadoop workloads, Natjam performs bet...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decisi...
AbstractWith the accretion in use of Internet in everything, a prodigious influx of data is being ob...
MapReduce has become the dominant programming model in a cloud-based data processing environment, su...
We present Natjam, a system allowing high priority production jobs and low priority research jobs to...
As computer systems become larger and more complex, such as with the advent of clouds, scientists an...
MapReduce has been widely used as a Big Data processing platform. As it gets popular, its scheduling...
MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided...
In this paper, we explore the feasibility of enabling the scheduling of mixed hard and soft real-tim...
MapReduce has become a popular high performance computing paradigm for large-scale data processing. ...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
This study presents a soft deadline scheduler for distributed systems that aims of exploring data lo...
Abstract — This study presents a soft deadline scheduler for distributed systems that aims of explor...
The ever-growing need to improve return-on-investment (ROI) for cluster infrastructure that processe...
In recent years there has been an extraordinary growth of large-scale data processing and related te...
Within this paper, we target at one subset of production MapReduce workloads that contain some indep...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decisi...
AbstractWith the accretion in use of Internet in everything, a prodigious influx of data is being ob...
MapReduce has become the dominant programming model in a cloud-based data processing environment, su...
We present Natjam, a system allowing high priority production jobs and low priority research jobs to...
As computer systems become larger and more complex, such as with the advent of clouds, scientists an...
MapReduce has been widely used as a Big Data processing platform. As it gets popular, its scheduling...
MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided...
In this paper, we explore the feasibility of enabling the scheduling of mixed hard and soft real-tim...
MapReduce has become a popular high performance computing paradigm for large-scale data processing. ...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
This study presents a soft deadline scheduler for distributed systems that aims of exploring data lo...
Abstract — This study presents a soft deadline scheduler for distributed systems that aims of explor...
The ever-growing need to improve return-on-investment (ROI) for cluster infrastructure that processe...
In recent years there has been an extraordinary growth of large-scale data processing and related te...
Within this paper, we target at one subset of production MapReduce workloads that contain some indep...
We are entering a Big Data world. Many sectors of our economy are now guided by data-driven decisi...
AbstractWith the accretion in use of Internet in everything, a prodigious influx of data is being ob...
MapReduce has become the dominant programming model in a cloud-based data processing environment, su...