Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process several elements of a primitive data type simultaneously in fixed-size vectors. Classical sorting algorithms are not directly expressible in SIMD instructions. Accelerating sorting algorithms with SIMD instruction is therefore a creative endeavor. A promising approach for sorting with SIMD instructions is to use sorting networks for small arrays and Quicksort for large arrays. In this paper we improve vectorization techniques for sorting networks and Quicksort. In particular, we show how to use the full capacity of vector registers in sorting networks and how to make vectorized Quicksort robust with respect to different key distributions. To ...
Sorting is a basic task in many types of computer applications. Especially when large amounts of dat...
In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly paralle...
Ibis paper describes an optimized implementation of a set of mm (also called ah-prefix-sums) primiti...
Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process ...
Ever since its introduction, Sorting Network has been an active field of study. They can be efficien...
International audienceThe way developers implement their algorithms and how these implementations be...
Merging and Sorting algorithms are the backbone of many modern computer applica- tions. As such, eff...
Jahrzehntelang wurden Verbesserungen der Rechengeschwindigkeit erreicht, indem die Taktfrequenz der ...
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorith...
Abstract. Sample sort, a generalization of quicksort that partitions the input into many pieces, is ...
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors...
We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of...
We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of...
Sorting a set of items is a task that can be useful by itself or as a building block for more comple...
The authors describe an optimized implementation of a set of scan (also called all-prefix-sums) prim...
Sorting is a basic task in many types of computer applications. Especially when large amounts of dat...
In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly paralle...
Ibis paper describes an optimized implementation of a set of mm (also called ah-prefix-sums) primiti...
Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process ...
Ever since its introduction, Sorting Network has been an active field of study. They can be efficien...
International audienceThe way developers implement their algorithms and how these implementations be...
Merging and Sorting algorithms are the backbone of many modern computer applica- tions. As such, eff...
Jahrzehntelang wurden Verbesserungen der Rechengeschwindigkeit erreicht, indem die Taktfrequenz der ...
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorith...
Abstract. Sample sort, a generalization of quicksort that partitions the input into many pieces, is ...
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors...
We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of...
We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of...
Sorting a set of items is a task that can be useful by itself or as a building block for more comple...
The authors describe an optimized implementation of a set of scan (also called all-prefix-sums) prim...
Sorting is a basic task in many types of computer applications. Especially when large amounts of dat...
In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly paralle...
Ibis paper describes an optimized implementation of a set of mm (also called ah-prefix-sums) primiti...