International audienceIn this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications: data locality. We focus on the scheduling of a set of independant tasks, each depending on an input file. We assume that each of these input files has been replicated several times and placed in local storage of different nodes of a cluster, similarly of what we can find on HDFS system for example. We consider two optimization problems, related to the two natural metrics: makespan optimization (under the constraint that only local tasks are allowed) and communication optimization (under the constraint of never letting a processor idle in order to optimize makespan). For both problems we investigate the performance of dy...
Recent years have witnessed the prevalence of MapReduce-based systems, e.g., Apache Hadoop, in large...
International audienceWe consider the classical First Come First Served /backfilling algorithm which...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
In this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications:...
International audienceReplication of data files, as automatically performed by Distributed File Syst...
[[abstract]]Cloud computing has become more popular for a decade; it has been under continuous devel...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental p...
Abstract—Scheduling theory is a common tool to ana-lyze the performance of parallel and distributed ...
Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
MapReduce is a powerful platform for large-scale data processing. To achieve good performance, a Map...
MapReduce emerges as an important distributed program-ming paradigm for large-scale applications. Ru...
Management of Big Data is a Challenging issue. The MapReduce environment is the widely used key solu...
Recent years have witnessed the prevalence of MapReduce-based systems, e.g., Apache Hadoop, in large...
International audienceWe consider the classical First Come First Served /backfilling algorithm which...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
In this paper we concentrate on a crucial parameter for efficiency in Big Data and HPC applications:...
International audienceReplication of data files, as automatically performed by Distributed File Syst...
[[abstract]]Cloud computing has become more popular for a decade; it has been under continuous devel...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental p...
Abstract—Scheduling theory is a common tool to ana-lyze the performance of parallel and distributed ...
Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
MapReduce is a powerful platform for large-scale data processing. To achieve good performance, a Map...
MapReduce emerges as an important distributed program-ming paradigm for large-scale applications. Ru...
Management of Big Data is a Challenging issue. The MapReduce environment is the widely used key solu...
Recent years have witnessed the prevalence of MapReduce-based systems, e.g., Apache Hadoop, in large...
International audienceWe consider the classical First Come First Served /backfilling algorithm which...
Lightweight threads have become a common abstraction in the field of programming languages and opera...