Resource management systems like YARN or Mesos enable users to share cluster infrastructures by running analytics jobs in temporarily reserved containers. These containers are typically not isolated to achieve high degrees of overall resource utilizations despite the often fluctuating resource usage of single analytic jobs. However, some combinations of jobs utilize the resources better and interfere less with each others when running on the same nodes than others. This paper presents an approach for improving the resource utilization and job throughput when scheduling recurring data analysis jobs in shared cluster environments. Using a reinforcement learning algorithm, the scheduler continuously learns which jobs are best executed simultan...
With the growing business impact of distributed big data analytics jobs, it has become crucial to op...
M.Phil.Extensive studies have been conducted on cluster resource utilization due to the large invest...
AbstractMapReduce is presently established as an important distributed and parallel programming mode...
Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyz...
Resource usage of production workloads running on shared compute clusters often fluctuate significan...
Increasingly large datasets make scalable and distributed data analytics necessary. Frameworks such ...
Co-scheduling of jobs in data centers is a challenging scenario where jobs can compete for resources...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
To reduce the impact of network congestion on big data jobs, cluster management frameworks use vario...
The standard scheduler of Hadoop does not consider the characteristics of jobs such as computational...
Part 4: Green Computing and Resource ManagementInternational audienceWe present a resource-aware sch...
The field of distributed computer systems, while not new in computer science, is still the subject o...
In this paper we present a MapReduce task scheduler for shared environments in which MapReduce is ex...
MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided...
Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data paral...
With the growing business impact of distributed big data analytics jobs, it has become crucial to op...
M.Phil.Extensive studies have been conducted on cluster resource utilization due to the large invest...
AbstractMapReduce is presently established as an important distributed and parallel programming mode...
Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyz...
Resource usage of production workloads running on shared compute clusters often fluctuate significan...
Increasingly large datasets make scalable and distributed data analytics necessary. Frameworks such ...
Co-scheduling of jobs in data centers is a challenging scenario where jobs can compete for resources...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
To reduce the impact of network congestion on big data jobs, cluster management frameworks use vario...
The standard scheduler of Hadoop does not consider the characteristics of jobs such as computational...
Part 4: Green Computing and Resource ManagementInternational audienceWe present a resource-aware sch...
The field of distributed computer systems, while not new in computer science, is still the subject o...
In this paper we present a MapReduce task scheduler for shared environments in which MapReduce is ex...
MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided...
Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data paral...
With the growing business impact of distributed big data analytics jobs, it has become crucial to op...
M.Phil.Extensive studies have been conducted on cluster resource utilization due to the large invest...
AbstractMapReduce is presently established as an important distributed and parallel programming mode...