Matrix transposition is a fundamental operation, but it may present a very low and hardly predictable data cache hit ratio for large matrices. Safe (worst-case) hit ratio predictability is required in real-time systems. In this paper, we obtain the relations among the cache parameters that guarantee the ideal (predictable) data hit ratio assuming a Least-Recently-Used (LRU) data cache. Considering our analytical assessments, we compare a tiling matrix transposition to a cache oblivious algorithm, modified with phantom padding to improve its data hit ratio. Our results show that, with an adequate tile size, the tiling version results in an equal or better data hit ratio. We also analyze the energy consumption and execution time of matrix tra...
International audienceComputer system and network performance can be significantly improved by cachi...
In this paper we present a simple analytical model to predict the hit ratio for a direct mapped cach...
This report deals with the ecient calculation of matrix-matrix multiplication, without using explici...
Matrix transposition is a fundamental operation, but it may present a very low and hardly predictabl...
Abstract Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficie...
Abstract This paper presents asymptotically optimal algo-rithms for rectangular matrix transpose, FF...
As computation processing capabilities have outstripped memory transport speeds, memory management c...
Abstract—An effectively designed and efficiently used memory hierarchy, composed of scratch-pads or ...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
In this paper we construct an analytic model of cache misses during matrix multiplication. The analy...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
AbstractIn this paper we construct an analytic model of cache misses during matrix multiplication. T...
While caches have become invaluable for higher-end architectures due to their ability to hide, in pa...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
This paper presents a method for tight prediction of worst-case performance of data caches in high-p...
International audienceComputer system and network performance can be significantly improved by cachi...
In this paper we present a simple analytical model to predict the hit ratio for a direct mapped cach...
This report deals with the ecient calculation of matrix-matrix multiplication, without using explici...
Matrix transposition is a fundamental operation, but it may present a very low and hardly predictabl...
Abstract Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficie...
Abstract This paper presents asymptotically optimal algo-rithms for rectangular matrix transpose, FF...
As computation processing capabilities have outstripped memory transport speeds, memory management c...
Abstract—An effectively designed and efficiently used memory hierarchy, composed of scratch-pads or ...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
In this paper we construct an analytic model of cache misses during matrix multiplication. The analy...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
AbstractIn this paper we construct an analytic model of cache misses during matrix multiplication. T...
While caches have become invaluable for higher-end architectures due to their ability to hide, in pa...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
This paper presents a method for tight prediction of worst-case performance of data caches in high-p...
International audienceComputer system and network performance can be significantly improved by cachi...
In this paper we present a simple analytical model to predict the hit ratio for a direct mapped cach...
This report deals with the ecient calculation of matrix-matrix multiplication, without using explici...