. Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their efficient implementation on a given parallel machine is a key issue for the maximal exploitation of the system's computational power. In this work we refer to a massively parallel processing SIMD machine (the APE100/Quadrics [2]) and to the adoption of the hyper-systolic method [3, 6, 4] to efficiently implement BLAS-3 on such a machine. The results we achieved (nearly 60-70% of the peak performances for large matrices) demonstrate the validity of the proposed approach. The work is structured as follows: section 1 is devoted to review BLAS-3, in se...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
Hyper-systolic algorithms represent a new class of parallel computing structures. Because of their r...
this paper is devoted to a new systolic parallelization scheme for matrix-matrix multiplication that...
A novel parallel algorithm for matrix multiplication is presented. It is based on a 1-D hyper-systol...
AbstractA profile is given of current research, as it pertains to computational mathematics, on Very...
We investigate the performance gains from hyper-systolic implementations of n2-loop problems on the ...
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matr...
The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out mat...
Massively parallel computer systems, having thousands of identical processors operating in SIMD mode...
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
Hyper-systolic algorithms represent a new class of parallel computing structures. Because of their r...
this paper is devoted to a new systolic parallelization scheme for matrix-matrix multiplication that...
A novel parallel algorithm for matrix multiplication is presented. It is based on a 1-D hyper-systol...
AbstractA profile is given of current research, as it pertains to computational mathematics, on Very...
We investigate the performance gains from hyper-systolic implementations of n2-loop problems on the ...
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matr...
The Level 3 BLAS (BLAS3) are a set of specifications of Fortran 77 subprograms for carrying out mat...
Massively parallel computer systems, having thousands of identical processors operating in SIMD mode...
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...