The DBLAS Distributed BLAS Library is a portable version of parallel BLAS that has been highly tuned for the Fujitsu AP1000 and AP+. In this paper, we describe performance enhancements made for two very different high performance distributed memory platforms, the Fujitsu AP3000 and the Fujitsu VPP-300. Even with the provision of highly tuned (vendor-supplied) serial BLAS implementations, attention must be given to cell computation speed issues, since serial BLAS does not supply a local matrix transpose routine (which is needed in many places), nor does it supply routines to adequately handle the triangular matrices which arise in the parallel context. We will describe the differing principles used on the UltraSPARC and VPP-300 nodes to opti...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines desig...
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes the design, implementation and performance of a parallel direct dense symmetric...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
A research agreement between the Australian National University and Fujitsu Japan has led to the dev...
The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DG...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A simple but highly effective approach for transforming high-performance implementations on cachebas...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines desig...
This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in ...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) li...
This paper describes the design, implementation and performance of a parallel direct dense symmetric...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
A research agreement between the Australian National University and Fujitsu Japan has led to the dev...
The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DG...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method...
A simple but highly effective approach for transforming high-performance implementations on cachebas...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as m...
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines desig...