We analyze YARN container overhead and present early results of reducing its overhead by dynamically adjusting the input split size. YARN is designed as a generic resource manager that decouples programming models from resource management infrastructures. We demonstrate that YARN's generic design incurs significant overhead because each con- tainer must perform various initialization steps, including authentication. To reduce container overhead without changing the existing YARN framework significantly, we propose leverag- ing the input split, which is the logical representation of physical HDFS blocks. With input splits, we can combine multiple HDFS blocks and increase the input size of each container, thereby enabling a single map wa...
Remote memory techniques for datacenter applications have recently gained a great deal of popularity...
Efficiently managing resources and improving throughput in a large-scale cluster has become a crucia...
International audienceMemory size limits the number of instances available in memory at a single tim...
Department of Computer Science and EngineeringIn this work, we present a novel HDFS block coalescing...
Hadoop clusters have been transitioning from a dedicated cluster environment to a shared cluster env...
In the last year, Hadoop YARN has become the defacto standard resource management platform for data-...
Hadoop YARN is an open project developed by the Apache Software Foundation to provide a resource man...
Abstract- Hadoop YARN is a software framework that supports data intensive distributed application. ...
Containers are considered an optimized fine-grain alternative to virtual machines in cloud-based sys...
The MapReduce framework has become the de facto scheme for scalable semi-structured and un-structure...
Efficiently managing resources and improving throughput in a large-scale cluster has become a crucia...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
Abstract—Resource management is one of the most indispens-able components of cluster-level infrastru...
To address the computing challenge of ’big data’, a number of data-intensive computing frameworks (e...
In recent years, there has been a rising demand for IT solutions that are capable to handle vast amo...
Remote memory techniques for datacenter applications have recently gained a great deal of popularity...
Efficiently managing resources and improving throughput in a large-scale cluster has become a crucia...
International audienceMemory size limits the number of instances available in memory at a single tim...
Department of Computer Science and EngineeringIn this work, we present a novel HDFS block coalescing...
Hadoop clusters have been transitioning from a dedicated cluster environment to a shared cluster env...
In the last year, Hadoop YARN has become the defacto standard resource management platform for data-...
Hadoop YARN is an open project developed by the Apache Software Foundation to provide a resource man...
Abstract- Hadoop YARN is a software framework that supports data intensive distributed application. ...
Containers are considered an optimized fine-grain alternative to virtual machines in cloud-based sys...
The MapReduce framework has become the de facto scheme for scalable semi-structured and un-structure...
Efficiently managing resources and improving throughput in a large-scale cluster has become a crucia...
The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured...
Abstract—Resource management is one of the most indispens-able components of cluster-level infrastru...
To address the computing challenge of ’big data’, a number of data-intensive computing frameworks (e...
In recent years, there has been a rising demand for IT solutions that are capable to handle vast amo...
Remote memory techniques for datacenter applications have recently gained a great deal of popularity...
Efficiently managing resources and improving throughput in a large-scale cluster has become a crucia...
International audienceMemory size limits the number of instances available in memory at a single tim...