This research aims at creating and providing a framework to describe algorithmic redistribution methods for various block cyclic decompositions. To do so properties of this data distribution scheme are formally exhibited. The examination of a number of basic dense linear algebra operations illustrates the application of those properties. This study analyzes the extent to which the general two-dimensional block cyclic data distribution allows for the expression of efficient as well as flexible matrix operations. This study also quantifies theoretically and practically how much of the efficiency of optimal block cyclic data layouts can be maintained. The general block cyclic decomposition scheme is shown to allow for the expression of flexibl...
We study the implementation of dense linear algebra computations, such as matrix multiplication and ...
This article is devoted to the run-time redistribution of one-dimensional arrays that are distribute...
This article is devoted to the run-time redistribution of one-dimensional arrays that are distribute...
Implementing linear algebra kernels on distributed memory parallel computers raises the problem of d...
Implementing linear algebra kernels on distributed memory parallel computers raises the problem of d...
(eng) Implementing linear algebra kernels on distributed memory parallel computers raises the proble...
In this paper, we present a new load balancing technique, called panel scattering, which is generall...
A significant part of scientific codes consist of sparse matrix computations. In this work we propos...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
International audienceThis article is devoted to the run-time redistribution of one-dimensional arra...
International audienceThis article is devoted to the run-time redistribution of one-dimensional arra...
[[abstract]]In many scientific applications, dynamic array redistribution is usually required to enh...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
We study the implementation of dense linear algebra computations, such as matrix multiplication and ...
We study the implementation of dense linear algebra computations, such as matrix multiplication and ...
This article is devoted to the run-time redistribution of one-dimensional arrays that are distribute...
This article is devoted to the run-time redistribution of one-dimensional arrays that are distribute...
Implementing linear algebra kernels on distributed memory parallel computers raises the problem of d...
Implementing linear algebra kernels on distributed memory parallel computers raises the problem of d...
(eng) Implementing linear algebra kernels on distributed memory parallel computers raises the proble...
In this paper, we present a new load balancing technique, called panel scattering, which is generall...
A significant part of scientific codes consist of sparse matrix computations. In this work we propos...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
International audienceThis article is devoted to the run-time redistribution of one-dimensional arra...
International audienceThis article is devoted to the run-time redistribution of one-dimensional arra...
[[abstract]]In many scientific applications, dynamic array redistribution is usually required to enh...
(eng) We study the implementation of dense linear algebra computations, such as matrix multiplicatio...
We study the implementation of dense linear algebra computations, such as matrix multiplication and ...
We study the implementation of dense linear algebra computations, such as matrix multiplication and ...
This article is devoted to the run-time redistribution of one-dimensional arrays that are distribute...
This article is devoted to the run-time redistribution of one-dimensional arrays that are distribute...