The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The BLAS of the CMSSL have been implemented as a two{level structure to exploit optimizations local to nodes and across nodes. This paper presents the implementation considerations and performance of the Local BLAS, or BLAS local to each node of the system. A wide variety of loop structures and unrollings have been implemented in order to achieve a uniform and high performance, irrespective of the data layout in node memory. The CMSSL is the only existing high{performance library capable of supporting both the data parallel and message passing modes of programming a distributed memory computer. The im...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
(CMSSL) is a library of scientific routines designed for distributed memory architectures. The basic...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Massively parallel computing holds the promise of extreme performance. The utility of these systems ...
Massively parallel processors introduce new demands on software systems with respect to performance,...
Massively parallel processors introduces new demands on software systems with respect to performance...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes the design of ScaLAPACK, a scalable software library for performing dense and b...
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, b...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
(CMSSL) is a library of scientific routines designed for distributed memory architectures. The basic...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Massively parallel computing holds the promise of extreme performance. The utility of these systems ...
Massively parallel processors introduce new demands on software systems with respect to performance,...
Massively parallel processors introduces new demands on software systems with respect to performance...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes the design of ScaLAPACK, a scalable software library for performing dense and b...
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, b...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra l...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...