This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in 1993. These include the general implementation of Distributed BLAS Level 3 subroutines (for the scattered storage scheme). The performance and user interface issues of the implementation will be discussed. Implementations of Distributed BLAS-based LU Decomposition, Cholesky Factorization and Star Product algorithms will be described. The porting of the Basic Fourier Functions, from the Fujitsu-ANU Area-4 Project, on the AP1000 is also discussed. While the parallelization of the main FFT algorithm only involves communication on a single `transposition' step, several optimizations including fast roots of unity calculation, are required for...
This dissertation details contributions made by the author to the field of computer science while wo...
The paper presents the novel principle on constructing a new class of highly parallel fast stable nu...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
The DBLAS Distributed BLAS Library is a portable version of parallel BLAS that has been highly tuned...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
In this paper we consider the data distribution and data movement issues related to the solution of ...
. Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical prob...
This article discusses the core factorization routines included in the ScaLAPACK library. These rout...
This paper describes the design, implementation and performance of a parallel direct dense symmetric...
Massively parallel computer systems, having thousands of identical processors operating in SIMD mode...
Linear algebra kernels are in the core of many scientific applications. We propose a unified, perfor...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
This dissertation details contributions made by the author to the field of computer science while wo...
The paper presents the novel principle on constructing a new class of highly parallel fast stable nu...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
The DBLAS Distributed BLAS Library is a portable version of parallel BLAS that has been highly tuned...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
In this paper we consider the data distribution and data movement issues related to the solution of ...
. Basic Linear Algebra Subroutines (BLAS-3) [1] are building blocks to solve a lot of numerical prob...
This article discusses the core factorization routines included in the ScaLAPACK library. These rout...
This paper describes the design, implementation and performance of a parallel direct dense symmetric...
Massively parallel computer systems, having thousands of identical processors operating in SIMD mode...
Linear algebra kernels are in the core of many scientific applications. We propose a unified, perfor...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
This dissertation details contributions made by the author to the field of computer science while wo...
The paper presents the novel principle on constructing a new class of highly parallel fast stable nu...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...