This paper presents an experimental study on cache memory designs for vector computers. We use an execution-driven simulator to evaluate vector cache performance of a set of application programs from Perfect Club and SPEC92 benchmark suites. Our simulation results uncover a few important facts which were unknown before: First of all, the prime-mapped cache that we newly proposed shows great performance potential in vector processing environment. Because of its conflict-free property, the prime-mapped cache performs significantly better than conventional cache designs for all applications considered. Second, performance results on the benchmarks indicate that data locality in vector processing does exist, although the effects of line size, a...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
To reduce the average memory access time, most current processors make use of a multilevel cache sub...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
An innovative cache design for vector computers, called a prime-mapped cache, is introduced. By util...
In this paper, we use execution-driven simulation to study and compare vector processing performance...
Cache memory is an important level of the memory hierarchy, and its performance and implementation c...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
The growing gap between sustained and peak performance for scientific applications has become a well...
This thesis evaluates an innovative cache design called, prime-mapped cache. The performance analysi...
The growing gap between sustained and peak performance for scientific applications is a well-known ...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
To reduce the average memory access time, most current processors make use of a multilevel cache sub...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By...
An innovative cache design for vector computers, called a prime-mapped cache, is introduced. By util...
In this paper, we use execution-driven simulation to study and compare vector processing performance...
Cache memory is an important level of the memory hierarchy, and its performance and implementation c...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
The growing gap between sustained and peak performance for scientific applications has become a well...
This thesis evaluates an innovative cache design called, prime-mapped cache. The performance analysi...
The growing gap between sustained and peak performance for scientific applications is a well-known ...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands...
To reduce the average memory access time, most current processors make use of a multilevel cache sub...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...