We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for...
In this paper we propose and evaluate a set of new strategies for the solution of three dimensional ...
Multi-core architectures are becoming more common and core counts continue to increase. There are s...
AbstractThis paper explores the need for asynchronous iteration algorithms as smoothers in multigrid...
AbstractIn this paper we investigate how stencil computations can be implemented on state-of-the-art...
In this thesis, we evaluate the interference between multiple GPU (Graphics processing unit) kernels...
In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Un...
International audienceWe study the impact of asynchronism on parallel iterative algorithms in the pa...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods...
International audienceWe study the impact of asynchronism on parallel iterative algorithms in the pa...
Time series motif (similarities) and discords discovery is one of the most important and challenging...
Recent technological and economic developments have led to widespread availability of multi-core CP...
Mathematicians and computational scientists are often limited in their ability to model complex phen...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
In this paper we propose and evaluate a set of new strategies for the solution of three dimensional ...
Multi-core architectures are becoming more common and core counts continue to increase. There are s...
AbstractThis paper explores the need for asynchronous iteration algorithms as smoothers in multigrid...
AbstractIn this paper we investigate how stencil computations can be implemented on state-of-the-art...
In this thesis, we evaluate the interference between multiple GPU (Graphics processing unit) kernels...
In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Un...
International audienceWe study the impact of asynchronism on parallel iterative algorithms in the pa...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods...
International audienceWe study the impact of asynchronism on parallel iterative algorithms in the pa...
Time series motif (similarities) and discords discovery is one of the most important and challenging...
Recent technological and economic developments have led to widespread availability of multi-core CP...
Mathematicians and computational scientists are often limited in their ability to model complex phen...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
An emerging trend in processor architecture seems to indicate the doubling of the number of cores pe...
In this paper we propose and evaluate a set of new strategies for the solution of three dimensional ...
Multi-core architectures are becoming more common and core counts continue to increase. There are s...
AbstractThis paper explores the need for asynchronous iteration algorithms as smoothers in multigrid...