We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU. Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the ...
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data set...
The traditional sorting technique, sequential sorting, is inefficient with increasing amounts of dat...
Sorting algorithms have been studied extensively since past three decades. Their uses are found in m...
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two c...
We describe the design of high-performance parallel radix sort and merge sort routines for manycore ...
We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of coun...
The GPU is an effective architecture for sorting due to its massive parallelism and high memory band...
Sorting is a common problem in computer science. There are a lot of well-known sorting algorithms cr...
This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achie...
Sorting is a primitive operation that is a building block for countless algorithms. As such, it is i...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
Abstract Sorting is a common problem in computer science. There are a lot of well-known sorting algo...
Sorting is an important problem in computing that has a rich history of investigation by various res...
Sorting algorithms are the deciding factor for the performance of common operations such as removal ...
As a basic building block of many applications, sorting algorithms that efficiently run on modern ma...
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data set...
The traditional sorting technique, sequential sorting, is inefficient with increasing amounts of dat...
Sorting algorithms have been studied extensively since past three decades. Their uses are found in m...
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two c...
We describe the design of high-performance parallel radix sort and merge sort routines for manycore ...
We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of coun...
The GPU is an effective architecture for sorting due to its massive parallelism and high memory band...
Sorting is a common problem in computer science. There are a lot of well-known sorting algorithms cr...
This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achie...
Sorting is a primitive operation that is a building block for countless algorithms. As such, it is i...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
Abstract Sorting is a common problem in computer science. There are a lot of well-known sorting algo...
Sorting is an important problem in computing that has a rich history of investigation by various res...
Sorting algorithms are the deciding factor for the performance of common operations such as removal ...
As a basic building block of many applications, sorting algorithms that efficiently run on modern ma...
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data set...
The traditional sorting technique, sequential sorting, is inefficient with increasing amounts of dat...
Sorting algorithms have been studied extensively since past three decades. Their uses are found in m...