To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically during execution. With the recent advent of scatter operations and atomic hardware primitives it is now possible to bring some of the more elaborate dynamic load balancing schemes from the conventional SMP systems domain to the graphics processor domain. We have compared four different dynamic load balancing methods to see which one is most suited to the highly parallel world of graphics processors. Three of these methods were lock-f...
In parallel iterative applications, computational efficiency is essential for addressing large probl...
Programs developed under the Compute Unified Device Architecture obtain the highest performance rate...
The overall efficiency of parallel algorithms is most decisively effected by the strategy applied fo...
To get maximum performance on the many-core graphics processorsit is important to have an even balan...
Abstract — To get maximum performance on the many-core graphics processors, it is important to have ...
In this chapter, we present a methodology for efficient load balancing of computational problems tha...
In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly paralle...
In parallel computing, obtaining maximal performance is often mandatory to solve large and complex p...
In this paper we revisit the design of concurrent data structures -- specifically queues -- and exam...
The convergence of highly parallel many-core graphics processors with conventional multi-core proces...
The computational power provided by many-core graph-ics processing units (GPUs) has been exploited i...
Multicomputer systems based on message passing draw attractions in the field of high performance co...
A large class of computational problems are characterised by frequent synchronisation, and computati...
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work pro...
. In this paper, we present a cohesive, practical load balancing framework that addresses many short...
In parallel iterative applications, computational efficiency is essential for addressing large probl...
Programs developed under the Compute Unified Device Architecture obtain the highest performance rate...
The overall efficiency of parallel algorithms is most decisively effected by the strategy applied fo...
To get maximum performance on the many-core graphics processorsit is important to have an even balan...
Abstract — To get maximum performance on the many-core graphics processors, it is important to have ...
In this chapter, we present a methodology for efficient load balancing of computational problems tha...
In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly paralle...
In parallel computing, obtaining maximal performance is often mandatory to solve large and complex p...
In this paper we revisit the design of concurrent data structures -- specifically queues -- and exam...
The convergence of highly parallel many-core graphics processors with conventional multi-core proces...
The computational power provided by many-core graph-ics processing units (GPUs) has been exploited i...
Multicomputer systems based on message passing draw attractions in the field of high performance co...
A large class of computational problems are characterised by frequent synchronisation, and computati...
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work pro...
. In this paper, we present a cohesive, practical load balancing framework that addresses many short...
In parallel iterative applications, computational efficiency is essential for addressing large probl...
Programs developed under the Compute Unified Device Architecture obtain the highest performance rate...
The overall efficiency of parallel algorithms is most decisively effected by the strategy applied fo...