Standard parallel sorting algorithms like sample sort rely on data partitioning techniques to distribute keys across processors. The sampling cost in sample sort for good load balance is prohibitive for massive clusters. We describe Histogram sort with sampling, an adaptation of the popular Histogram sort algorithm. We show that Histogram sort with sampling has sound theoretical guarantees and reduces the sample size requirements from O(p log N/epsilon^2) to O(k p sqrt[k]{log p/epsilon}) with k rounds of histogramming w.h.p.. Histogram sort with sampling is more efficient than Sample sort algorithms that achieve the same level of load balance, both in theory and practice, especially for massively parallel applications, scaling to tens of th...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Parallel sorting algorithms have been proposed for a variety of multiple instruction streams, multip...
Standard parallel sorting algorithms like sample sort rely on data partitioning techniques to distri...
Abstract — Sorting is a commonly used process with a wide breadth of applications in the high perfor...
Abstract. Sample sort, a generalization of quicksort that partitions the input into many pieces, is ...
Histograms are used in various fields to quickly profile the distribution of a large amount of data....
Histogramming is a technique by which input datasets are mined to extract features and patterns. His...
Random sampling is a standard technique for constructing (approximate) histograms for query optimiza...
Sorting is an important problem in computing that has a rich history of investigation by various res...
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not o...
Histogramming is a tool commonly used in data analysis. Although its serial version is simple to imp...
AbstractIn this paper, a refined deterministic sampling strategy is presented. It allows to improve ...
Abstract –We consider estimation of arbitrary range partitioning of data values and ranking of frequ...
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU Bucket Sort) is not o...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Parallel sorting algorithms have been proposed for a variety of multiple instruction streams, multip...
Standard parallel sorting algorithms like sample sort rely on data partitioning techniques to distri...
Abstract — Sorting is a commonly used process with a wide breadth of applications in the high perfor...
Abstract. Sample sort, a generalization of quicksort that partitions the input into many pieces, is ...
Histograms are used in various fields to quickly profile the distribution of a large amount of data....
Histogramming is a technique by which input datasets are mined to extract features and patterns. His...
Random sampling is a standard technique for constructing (approximate) histograms for query optimiza...
Sorting is an important problem in computing that has a rich history of investigation by various res...
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not o...
Histogramming is a tool commonly used in data analysis. Although its serial version is simple to imp...
AbstractIn this paper, a refined deterministic sampling strategy is presented. It allows to improve ...
Abstract –We consider estimation of arbitrary range partitioning of data values and ranking of frequ...
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU Bucket Sort) is not o...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Parallel sorting algorithms have been proposed for a variety of multiple instruction streams, multip...