Abstract. Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions. This transformation facilitates optimizations like loop unrolling and software pipelining. The final implementation, albeit cache efficient, is limited by a linear number of memory accesses rather than the O(n log n) comparisons. On an Itanium 2 machine, we obtain a speedup of up to 2 over std::sort from the GCC STL library, which is kno...
Background: Sorting algorithms are an essential part of computer science. With the use of parallelis...
Many sorting algorithms that perform well on uniformly distributed data suffer significant performan...
Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer app...
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as t...
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory mach...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
ABSTRACT In this paper, we present HykSort, an optimized comparison sort for distributed memory arch...
The LogP model characterizes the performance of modern parallel machines with a small set of paramet...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Sorting is an important problem in computing that has a rich history of investigation by various res...
Quicksort is well-know algorithm used for sorting, making O(n log n) comparisons to sort a dataset o...
The problem of determining the relative efficiencies of different sorting algo-rithms is discussed i...
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given som...
Sorting is a basic task in many types of computer applications. Especially when large amounts of dat...
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not o...
Background: Sorting algorithms are an essential part of computer science. With the use of parallelis...
Many sorting algorithms that perform well on uniformly distributed data suffer significant performan...
Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer app...
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as t...
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory mach...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
ABSTRACT In this paper, we present HykSort, an optimized comparison sort for distributed memory arch...
The LogP model characterizes the performance of modern parallel machines with a small set of paramet...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
Sorting is an important problem in computing that has a rich history of investigation by various res...
Quicksort is well-know algorithm used for sorting, making O(n log n) comparisons to sort a dataset o...
The problem of determining the relative efficiencies of different sorting algo-rithms is discussed i...
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given som...
Sorting is a basic task in many types of computer applications. Especially when large amounts of dat...
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not o...
Background: Sorting algorithms are an essential part of computer science. With the use of parallelis...
Many sorting algorithms that perform well on uniformly distributed data suffer significant performan...
Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer app...