Cache efficient bidiagonalization using BLAS 2.5 operators

G. W. Howell
J. W. Demmel
C. T. Fulton
S. Hammarling
K. Marmol

Publication date

October 2015

Abstract

On cache based computer architectures using current standard al-gorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors. In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two ( GEMV) vector-matrix multiplications can be done with one pass of the unre-duced trailing part of the matrix through cache. Two new BLAS 2.5 operations approximately cut in half the transfer of data from main memory to cache. We give detailed algorithm descriptions and com-pare timings with the current LAPACK bidiagonalization algorithm.

Extracted data

We use cookies to provide a better user experience.

Data Protection

Cache efficient bidiagonalization using BLAS 2.5 operators

Abstract

Extracted data

Cache efficient bidiagonalization using BLAS 2.5 operators

Abstract

Extracted data

Related items

Related items