The purpose of this paper is to explore issues related to the computation and communication performance of the Basic Linear Algebra Subroutines (BLAS-1) and related kernels on the SGI/Cray Origin 2000 parallel computer. Experiments are performed both on vendor-supplied mathematical library routines as well as hand-coded loops and array syntax. The goal of this study is to get a better understanding of performance issues pertaining to the Origin 2000 architecture
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines desig...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
(CMSSL) is a library of scientific routines designed for distributed memory architectures. The basic...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Massively parallel computer systems, having thousands of identical processors operating in SIMD mode...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
Today, most of the Cray multiprocessor systems are still used within a multiprogramming environment....
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines desig...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
(CMSSL) is a library of scientific routines designed for distributed memory architectures. The basic...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
This working note examines different Fortran implementations of the Basic Linear Algebra Subprograms...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Massively parallel computer systems, having thousands of identical processors operating in SIMD mode...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
Today, most of the Cray multiprocessor systems are still used within a multiprogramming environment....
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines desig...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
(CMSSL) is a library of scientific routines designed for distributed memory architectures. The basic...