The widening gap between the processor clock speed and the memory latency puts an added pressure on the performance of cache memories. This problem is amplified by the increase in instruction issue per cycle. This paper reports on the initial evaluation of a split scalar and array data cache. This scheme allows an efficient exploitation of both temporal and spatial locality by having a different organization and block size for each of the data caches. Initial experimental results show very significant improvements in hit rates on some Spec95fp and NAS benchmarks
This paper shows that even very small reconfigurable data caches, when split to serve data streams ...
Exploring the performance of split data cache schemes on superscalar processors and symmetric multip...
textFor the past decade, microprocessors have been improving in overall performance at a rate of ap...
The purpose of this paper is to reevaluate the performance of the Split Temporal/Spatial (STS) cache...
In this paper we show that partitioning data cache into array and scalar caches can improve cache ac...
Abstract — As more cores (processing elements) are included in a single chip, it is likely that the ...
Current split data caches classify data as having either spatial locality or temporal locality. The...
The goal of cache design is to exploit data localities; however, the means to this end vary widely a...
Abstract. Future embedded systems are expected to use chip-multiprocessors to provide the execution ...
Abstract—In most embedded and general purpose archi-tectures, stack data and non-stack data is cache...
A scalar metric for temporal locality is proposed. The metric is based on LRU stack distance. This p...
Abstract Caches are widely used to reduce the speed gap between processors and memories. However, th...
During the last two decades, the performance of CPU has been developed much faster than that of memo...
Memory, as a shared resource, has always been a high latency and bandwidth limited bottleneck of the...
Treating data based on its location in memory has received much attention in recent years due to its...
This paper shows that even very small reconfigurable data caches, when split to serve data streams ...
Exploring the performance of split data cache schemes on superscalar processors and symmetric multip...
textFor the past decade, microprocessors have been improving in overall performance at a rate of ap...
The purpose of this paper is to reevaluate the performance of the Split Temporal/Spatial (STS) cache...
In this paper we show that partitioning data cache into array and scalar caches can improve cache ac...
Abstract — As more cores (processing elements) are included in a single chip, it is likely that the ...
Current split data caches classify data as having either spatial locality or temporal locality. The...
The goal of cache design is to exploit data localities; however, the means to this end vary widely a...
Abstract. Future embedded systems are expected to use chip-multiprocessors to provide the execution ...
Abstract—In most embedded and general purpose archi-tectures, stack data and non-stack data is cache...
A scalar metric for temporal locality is proposed. The metric is based on LRU stack distance. This p...
Abstract Caches are widely used to reduce the speed gap between processors and memories. However, th...
During the last two decades, the performance of CPU has been developed much faster than that of memo...
Memory, as a shared resource, has always been a high latency and bandwidth limited bottleneck of the...
Treating data based on its location in memory has received much attention in recent years due to its...
This paper shows that even very small reconfigurable data caches, when split to serve data streams ...
Exploring the performance of split data cache schemes on superscalar processors and symmetric multip...
textFor the past decade, microprocessors have been improving in overall performance at a rate of ap...