This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) library and the LINPACK Benchmark on the Fujitsu AP1000. The performance of these applications is regarded as important for distributed memory architectures such as the AP1000. We discuss the techniques involved in optimizing these applications without significantly sacrificing numerical stability. Many of these techniques may also be applied to other numerical applications. They include the use of software pipelining and loop unrolling to optimize scalar processor computation, the utilization of fast communication primitives on the AP1000 (particularly row and column broadcasting using wormhole routing), blocking and partitioning methods, and ...
This dissertation details contributions made by the author to the field of computer science while wo...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
The aim of this project was to encapsulate the needs of computational science applications. Performa...
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply ...
Abstract—Dense linear algebra has been traditionally used to evaluate the performance and efficiency...
The DBLAS Distributed BLAS Library is a portable version of parallel BLAS that has been highly tuned...
In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCen...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the pe...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
This dissertation details contributions made by the author to the field of computer science while wo...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
The aim of this project was to encapsulate the needs of computational science applications. Performa...
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply ...
Abstract—Dense linear algebra has been traditionally used to evaluate the performance and efficiency...
The DBLAS Distributed BLAS Library is a portable version of parallel BLAS that has been highly tuned...
In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCen...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the pe...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
This dissertation details contributions made by the author to the field of computer science while wo...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...