MapReduce is a parallel computing model in which a large dataset is split into smaller parts and executed on multiple machines. Due to its simplicity, MapReduce has been widely used in various applications domains. MapReduce can significantly reduce the processing time of a large amount of data by dividing the dataset into smaller parts and processing them in parallel in multiple machines. However, when data are not uniformly distributed, we have the so called partitioning skew, where the allocation of tasks to machines becomes unbalanced, either by the distribution function splitting the dataset unevenly or because a part of the data is more complex and requires greater computational effort. To solve this problem, we propose an approach ba...
Abstract. Simulated annealing’s high computational intensity has stimulated researchers to experimen...
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence ...
Large datasets, of the order of peta- and tera- bytes, are becoming prevalent in many scientific dom...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
During the shuffle stage of the MapReduce framework, a large volume of data may be relocated to the ...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
MapReduce is a programming model and an associated implementation for processing and generating larg...
In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogene...
The K-Means algorithm is one the most efficient and widely used algorithms for clustering data. Howe...
In this paper we present a new algorithm for the k- partitioning problem which achieves an improved...
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data l...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
MapReduce is an effective tool for parallel data processing. One significant issue in practical MapR...
In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the u...
Abstract. Simulated annealing’s high computational intensity has stimulated researchers to experimen...
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence ...
Large datasets, of the order of peta- and tera- bytes, are becoming prevalent in many scientific dom...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
During the shuffle stage of the MapReduce framework, a large volume of data may be relocated to the ...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
MapReduce is a programming model and an associated implementation for processing and generating larg...
In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogene...
The K-Means algorithm is one the most efficient and widely used algorithms for clustering data. Howe...
In this paper we present a new algorithm for the k- partitioning problem which achieves an improved...
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data l...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
MapReduce is an effective tool for parallel data processing. One significant issue in practical MapR...
In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the u...
Abstract. Simulated annealing’s high computational intensity has stimulated researchers to experimen...
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence ...
Large datasets, of the order of peta- and tera- bytes, are becoming prevalent in many scientific dom...