Abstract: Suppose the bits of a computer word are partitioned into d disjoint sets, each of which is used to represent one of a d-tuple of cartesian indices into d-dimensional space. Then, regardless of the partition, simple group operations and comparisons can be implemented for each index on a conventional processor in a sequence of two or three register operations. These indexings allow any blocked algorithm from linear algebra to use some non-standard matrix orderings that increase locality and enhance their performance. The underlying implementations were designed for alternating bit postitions to index Morton-ordered matrices, but they apply, as well, to any bit partitioning. A hybrid ordering of the elements of a matrix becomes possi...
In this article, we introduce a cache-oblivious method for sparse matrix–vector multiplication. Our ...
Sparse matrix-vector multiplication (shortly SpMV) is one of most common subroutines in the numerica...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
A proof of concept is offered for the uniform representation of matrices serially in Morton-order (o...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
During the last half-decade, a number of research efforts have centered around developing software f...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
We develop a prototype library for in-place (dense) matrix storage for-mat conversion between the ca...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
Abstract. We present a recursive way to partition hypergraphs which creates and exploits hypergraph ...
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking w...
Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly eva...
AbstractWe describe fast parallel algorithms for building index data structures that can be used to ...
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate...
We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar...
In this article, we introduce a cache-oblivious method for sparse matrix–vector multiplication. Our ...
Sparse matrix-vector multiplication (shortly SpMV) is one of most common subroutines in the numerica...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
A proof of concept is offered for the uniform representation of matrices serially in Morton-order (o...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
During the last half-decade, a number of research efforts have centered around developing software f...
This paper discusses optimizing computational linear algebra algorithms on a ring cluster of IBM R...
We develop a prototype library for in-place (dense) matrix storage for-mat conversion between the ca...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
Abstract. We present a recursive way to partition hypergraphs which creates and exploits hypergraph ...
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking w...
Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly eva...
AbstractWe describe fast parallel algorithms for building index data structures that can be used to ...
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate...
We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar...
In this article, we introduce a cache-oblivious method for sparse matrix–vector multiplication. Our ...
Sparse matrix-vector multiplication (shortly SpMV) is one of most common subroutines in the numerica...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...