To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greed-ily schedule the tasks and flows that are ready to run. How-ever, a large fraction of production jobs are recurring with predictable characteristics, which allows us to plan ahead for them. Coordinating the placement of data and tasks of these jobs allows for significantly improving their network local-ity and freeing up bandwidth, which can be used by other jobs running on the cluster. With this intuition, we develop Corral, a scheduling framework that uses characteristics of future workloads to determine an offl...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
The field of distributed computer systems, while not new in computer science, is still the subject o...
Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the...
Scheduling in large scale computing clusters is critical to job performance and resource utilization...
Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyz...
With the growing business impact of distributed big data analytics jobs, it has become crucial to op...
scheduling In this paper, we utilize a bandwidth-centric job communication model that captures the i...
© 2018 IEEE. Many datacenters usually process complex jobs such as MapReduce jobs. From a network pe...
MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided...
AbstractWith the accretion in use of Internet in everything, a prodigious influx of data is being ob...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
In Grids scheduling decisions are often made on the basis of jobs being either data or computation i...
Coscheduling has been shown to be a critical factor in achieving efficient parallel execution in tim...
In Grids scheduling decisions are often made on the basis of jobs being either data or computation i...
Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data paral...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
The field of distributed computer systems, while not new in computer science, is still the subject o...
Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the...
Scheduling in large scale computing clusters is critical to job performance and resource utilization...
Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyz...
With the growing business impact of distributed big data analytics jobs, it has become crucial to op...
scheduling In this paper, we utilize a bandwidth-centric job communication model that captures the i...
© 2018 IEEE. Many datacenters usually process complex jobs such as MapReduce jobs. From a network pe...
MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided...
AbstractWith the accretion in use of Internet in everything, a prodigious influx of data is being ob...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
In Grids scheduling decisions are often made on the basis of jobs being either data or computation i...
Coscheduling has been shown to be a critical factor in achieving efficient parallel execution in tim...
In Grids scheduling decisions are often made on the basis of jobs being either data or computation i...
Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data paral...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
The field of distributed computer systems, while not new in computer science, is still the subject o...
Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the...