International audienceReplication of data files, as automatically performed by Distributed File Systems such as HDFS, is known to have a crucial impact on data locality in addition to system fault tolerance. Indeed, intuitively, having more replicas of the same input file gives more opportunities for this task to be processed locally, i.e. without any input file transfer. Given the practical importance of this problem, a vast literature has been proposed to schedule tasks, based on a random placement of replicated input files. Our goal in this paper is to study the performance of these algorithms, both in terms of makespan minimization (minimize the completion time of the last task when non-local processing is forbidden) and communication m...
As a result of advances in technology and highly demanding users expectations, more and more applica...
Abstract. This paper is devoted to scheduling a large collection of independent tasks onto a distrib...
AbstractMapReduce is a popular parallel programming model used to solve wide range of BigData applic...
International audienceReplication of data files, as automatically performed by Distributed File Syst...
International audienceIn this paper we concentrate on a crucial parameter for efficiency in Big Data...
Abstract—Scheduling theory is a common tool to ana-lyze the performance of parallel and distributed ...
(eng) This paper is devoted to scheduling a large collection of independent tasks onto heterogeneous...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
This paper is devoted to scheduling a large collection of independent tasks onto heterogeneous clust...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
One typical use case of large-scale distributed computing in data centers is to decompose a computat...
Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
Task scheduling has a significant impact on the performance of the MapReduce computing framework. I...
Replication plays an important role for storage system to improve data availability, throughputand r...
As a result of advances in technology and highly demanding users expectations, more and more applica...
Abstract. This paper is devoted to scheduling a large collection of independent tasks onto a distrib...
AbstractMapReduce is a popular parallel programming model used to solve wide range of BigData applic...
International audienceReplication of data files, as automatically performed by Distributed File Syst...
International audienceIn this paper we concentrate on a crucial parameter for efficiency in Big Data...
Abstract—Scheduling theory is a common tool to ana-lyze the performance of parallel and distributed ...
(eng) This paper is devoted to scheduling a large collection of independent tasks onto heterogeneous...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
This paper is devoted to scheduling a large collection of independent tasks onto heterogeneous clust...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
One typical use case of large-scale distributed computing in data centers is to decompose a computat...
Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
Task scheduling has a significant impact on the performance of the MapReduce computing framework. I...
Replication plays an important role for storage system to improve data availability, throughputand r...
As a result of advances in technology and highly demanding users expectations, more and more applica...
Abstract. This paper is devoted to scheduling a large collection of independent tasks onto a distrib...
AbstractMapReduce is a popular parallel programming model used to solve wide range of BigData applic...