In this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications: data locality. We focus on the scheduling of a set of independant tasks, each depending on an input file. We assume that each of these input files has been replicated several times and placed in local storage of different nodes of a cluster, similarly of what we can find on HDFS system for example. We consider two optimization problems, related to the two natural metrics: makespan optimization (under the constraint that only local tasks are allowed) and communication optimization (under the constraint of never letting a processor idle in order to optimize makespan). For both problems we investigate the performance of dynamic schedulers, in p...
Management of Big Data is a Challenging issue. The MapReduce environment is the widely used key solu...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
International audienceWe consider the classical First Come First Served /backfilling algorithm which...
International audienceIn this paper we concentrate on a crucial parameter for efficiency in Big Data...
International audienceReplication of data files, as automatically performed by Distributed File Syst...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental p...
[[abstract]]Cloud computing has become more popular for a decade; it has been under continuous devel...
Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Abstract—Scheduling theory is a common tool to ana-lyze the performance of parallel and distributed ...
MapReduce is a powerful platform for large-scale data processing. To achieve good performance, a Map...
Recent years have witnessed the prevalence of MapReduce-based systems, e.g., Apache Hadoop, in large...
MapReduce emerges as an important distributed program-ming paradigm for large-scale applications. Ru...
Management of Big Data is a Challenging issue. The MapReduce environment is the widely used key solu...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
International audienceWe consider the classical First Come First Served /backfilling algorithm which...
International audienceIn this paper we concentrate on a crucial parameter for efficiency in Big Data...
International audienceReplication of data files, as automatically performed by Distributed File Syst...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental p...
[[abstract]]Cloud computing has become more popular for a decade; it has been under continuous devel...
Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Abstract—Scheduling theory is a common tool to ana-lyze the performance of parallel and distributed ...
MapReduce is a powerful platform for large-scale data processing. To achieve good performance, a Map...
Recent years have witnessed the prevalence of MapReduce-based systems, e.g., Apache Hadoop, in large...
MapReduce emerges as an important distributed program-ming paradigm for large-scale applications. Ru...
Management of Big Data is a Challenging issue. The MapReduce environment is the widely used key solu...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
International audienceWe consider the classical First Come First Served /backfilling algorithm which...