Matrix transposition is a fundamental operation, but it may present a very low and hardly predictable data cache hit ratio for large matrices. Safe (worst-case) hit ratio predictability is required in real-time systems. In this paper, we obtain the relations among the cache parameters that guarantee the ideal (predictable) data hit ratio assuming a Least-Recently-Used (LRU) data cache. Considering our analytical assessments, we compare a tiling matrix transposition to a cache oblivious algorithm, modified with phantom padding to improve its data hit ratio. Our results show that, with an adequate tile size, the tiling version results in an equal or better data hit ratio. We also analyze the energy consumption and execution time of matrix tra...
Abstract. We present new performance models and more compact data structures for cache blocking when...
AbstractIn this paper we construct an analytic model of cache misses during matrix multiplication. T...
We develop a prototype library for in-place (dense) matrix storage for-mat conversion between the ca...
Matrix transposition is a fundamental operation, but it may present a very low and hardly predictabl...
As computation processing capabilities have outstripped memory transport speeds, memory management c...
Abstract Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficie...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
This report deals with the ecient calculation of matrix-matrix multiplication, without using explici...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
Abstract This paper presents asymptotically optimal algo-rithms for rectangular matrix transpose, FF...
Abstract—An effectively designed and efficiently used memory hierarchy, composed of scratch-pads or ...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Sparse matrices are in the kernel of numerical applications. Their compressed storage, which permits...
Algorithms for the sparse matrix-vector multiplication (shortly SpMxV) are important building blocks...
In this paper we present a simple analytical model to predict the hit ratio for a direct mapped cach...
Abstract. We present new performance models and more compact data structures for cache blocking when...
AbstractIn this paper we construct an analytic model of cache misses during matrix multiplication. T...
We develop a prototype library for in-place (dense) matrix storage for-mat conversion between the ca...
Matrix transposition is a fundamental operation, but it may present a very low and hardly predictabl...
As computation processing capabilities have outstripped memory transport speeds, memory management c...
Abstract Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficie...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
This report deals with the ecient calculation of matrix-matrix multiplication, without using explici...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
Abstract This paper presents asymptotically optimal algo-rithms for rectangular matrix transpose, FF...
Abstract—An effectively designed and efficiently used memory hierarchy, composed of scratch-pads or ...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Sparse matrices are in the kernel of numerical applications. Their compressed storage, which permits...
Algorithms for the sparse matrix-vector multiplication (shortly SpMxV) are important building blocks...
In this paper we present a simple analytical model to predict the hit ratio for a direct mapped cach...
Abstract. We present new performance models and more compact data structures for cache blocking when...
AbstractIn this paper we construct an analytic model of cache misses during matrix multiplication. T...
We develop a prototype library for in-place (dense) matrix storage for-mat conversion between the ca...