Poor scalability on parallel architectures can be attributed to several factors, among which idle times, data movement, and runtime overhead are predominant. Conventional parallel loops and nested parallelism have proved successful for regular computational patterns. For more complex and irregular cases, however, these methods often perform poorly because they consider only a subset of these costs. Although data-driven methods are gaining popularity for efficiently utilizing computational cores, their data movement and runtime costs can be prohibitive for highly dynamic and irregular algorithms, such as fast multipole methods (FMMs). Furthermore, loop tiling, a technique that promotes data locality and has been successful for regular parall...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....
Poor scalability on parallel architectures can be attributed to several factors, among which idle ti...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled archite...
Locality of computation is key to obtaining high performance on a broad variety of parallel architec...
The emergence of multicore and manycore processors is set to change the parallel computing world. Ap...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
International audienceWith the advent of complex modern architectures, the low-level paradigms long ...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
We present efficient algorithms to build data structures and the lists needed for fast multipole met...
Data parallel operations are widely used in game, multimedia, physics and data-intensive and scienti...
This paper presents a simple method to reduce performance loss due to a parallel program's massive c...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....
Poor scalability on parallel architectures can be attributed to several factors, among which idle ti...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled archite...
Locality of computation is key to obtaining high performance on a broad variety of parallel architec...
The emergence of multicore and manycore processors is set to change the parallel computing world. Ap...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming ...
International audienceWith the advent of complex modern architectures, the low-level paradigms long ...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
We present efficient algorithms to build data structures and the lists needed for fast multipole met...
Data parallel operations are widely used in game, multimedia, physics and data-intensive and scienti...
This paper presents a simple method to reduce performance loss due to a parallel program's massive c...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
Parallel task-based programming models like OpenMP support the declaration of task data dependences....