The explotation of data locality in parallel computers is paramount to reduce the memory trac and communication among processing nodes. We focus on the exploitation of locality by Parallel Radix sort. The original Parallel Radix sort has several communication steps in which one sorting key may have to visit several processing nodes. In response to this, we propose a reorganization of Radix sort that leads to a highly local version of the algorithm at a very low cost. As a key feature, our algorithm performs one only communication step, forcing keys to move only once between processing nodes. Also, it reduces the amount of data communicated. Finally, the new algorithm achieves a good load and communication balance which makes it insensitive ...
A parallel sorting algorithm for sorting n elements evenly distributed over 2d =p nodes of a d-dimen...
Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer app...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
This electronic version was submitted by the student author. The certified thesis is available in th...
We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of coun...
Sorting algorithms are the deciding factor for the performance of common operations such as removal ...
Radix sort is a classical algorithm to sort $N$ records with integer keys. The keys are represented ...
Integer sorting is a subclass of the sorting problem where the elements have integer values and the ...
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorith...
Abstract — Sorting is a commonly used process with a wide breadth of applications in the high perfor...
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two c...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and f...
We describe the design of high-performance parallel radix sort and merge sort routines for manycore ...
Almost all computers regularly sort data. Many different sort algorithms have therefore been propose...
A parallel sorting algorithm for sorting n elements evenly distributed over 2d =p nodes of a d-dimen...
Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer app...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...
This electronic version was submitted by the student author. The certified thesis is available in th...
We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of coun...
Sorting algorithms are the deciding factor for the performance of common operations such as removal ...
Radix sort is a classical algorithm to sort $N$ records with integer keys. The keys are represented ...
Integer sorting is a subclass of the sorting problem where the elements have integer values and the ...
We have designed a radix sort algorithm for vector multiprocessors and have implemented the algorith...
Abstract — Sorting is a commonly used process with a wide breadth of applications in the high perfor...
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two c...
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, al...
In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and f...
We describe the design of high-performance parallel radix sort and merge sort routines for manycore ...
Almost all computers regularly sort data. Many different sort algorithms have therefore been propose...
A parallel sorting algorithm for sorting n elements evenly distributed over 2d =p nodes of a d-dimen...
Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer app...
Previous schemes for sorting on general-purpose parallel machines have had to choose between poor lo...