We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters
In this paper, we present a novel approach for parallel sorting on stream processing architectures. ...
In this paper, we present a novel approach for par-allel sorting on stream processing architectures....
Although sort has been extensively studied in many research works, it still remains a challenge in p...
This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achie...
This paper presents a comparative analysis of the three widely used parallel sorting algorithms: Odd...
Sorting is a common problem in computer science. There are a lot of well-known sorting algorithms cr...
The traditional sorting technique, sequential sorting, is inefficient with increasing amounts of dat...
Abstract Sorting is a common problem in computer science. There are a lot of well-known sorting algo...
As a basic building block of many applications, sorting algorithms that efficiently run on modern ma...
The GPU is an effective architecture for sorting due to its massive parallelism and high memory band...
Novel"manycore" architectures, such as graphics processors, are high-parallel and high-performance s...
Sorting is a common problem in computer science. There are lot of well-known sorting algorithms crea...
Sorting is a common problem in computer science. There are lot of well-known sorting algorithms crea...
Sorting algorithms have been studied extensively since past three decades. Their uses are found in m...
We describe the design of high-performance parallel radix sort and merge sort routines for manycore ...
In this paper, we present a novel approach for parallel sorting on stream processing architectures. ...
In this paper, we present a novel approach for par-allel sorting on stream processing architectures....
Although sort has been extensively studied in many research works, it still remains a challenge in p...
This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achie...
This paper presents a comparative analysis of the three widely used parallel sorting algorithms: Odd...
Sorting is a common problem in computer science. There are a lot of well-known sorting algorithms cr...
The traditional sorting technique, sequential sorting, is inefficient with increasing amounts of dat...
Abstract Sorting is a common problem in computer science. There are a lot of well-known sorting algo...
As a basic building block of many applications, sorting algorithms that efficiently run on modern ma...
The GPU is an effective architecture for sorting due to its massive parallelism and high memory band...
Novel"manycore" architectures, such as graphics processors, are high-parallel and high-performance s...
Sorting is a common problem in computer science. There are lot of well-known sorting algorithms crea...
Sorting is a common problem in computer science. There are lot of well-known sorting algorithms crea...
Sorting algorithms have been studied extensively since past three decades. Their uses are found in m...
We describe the design of high-performance parallel radix sort and merge sort routines for manycore ...
In this paper, we present a novel approach for parallel sorting on stream processing architectures. ...
In this paper, we present a novel approach for par-allel sorting on stream processing architectures....
Although sort has been extensively studied in many research works, it still remains a challenge in p...