This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of refer...
The performance gap between processors and main memory has been growing over the last decades. Fast ...
Because of the infeasibility or expense of large fully-associative caches, cache memories are often ...
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification...
We present a new technique for the parallel simulation of cache coherent shared memory multiprocess...
Today’s scientific progress is closely related with data processing, a process is implemented using ...
An application’s cache miss rate is used in timing analysis, system performance prediction and ...
Techniques to evaluate a program’s cache performance fall into two camps: 1. Traditional trace-base...
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support s...
We investigate the construction and application of parallel software caches in shared memory multipr...
This paper explores statistical simulation as a fast simulation technique for driving chip multiproc...
This thesis evaluates an innovative cache design called, prime-mapped cache. The performance analysi...
In this paper, we consider the evaluation of the memory hierarchy of multiprocessor systems via para...
In this research we built a SystemC Level-1 data cache system in a distributed shared memory archite...
Memory systems today possess more complexity than ever. On one hand, main memory technology has a mu...
We describe a model that enables us to analyze the running time of an algorithm in a computer with a...
The performance gap between processors and main memory has been growing over the last decades. Fast ...
Because of the infeasibility or expense of large fully-associative caches, cache memories are often ...
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification...
We present a new technique for the parallel simulation of cache coherent shared memory multiprocess...
Today’s scientific progress is closely related with data processing, a process is implemented using ...
An application’s cache miss rate is used in timing analysis, system performance prediction and ...
Techniques to evaluate a program’s cache performance fall into two camps: 1. Traditional trace-base...
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support s...
We investigate the construction and application of parallel software caches in shared memory multipr...
This paper explores statistical simulation as a fast simulation technique for driving chip multiproc...
This thesis evaluates an innovative cache design called, prime-mapped cache. The performance analysi...
In this paper, we consider the evaluation of the memory hierarchy of multiprocessor systems via para...
In this research we built a SystemC Level-1 data cache system in a distributed shared memory archite...
Memory systems today possess more complexity than ever. On one hand, main memory technology has a mu...
We describe a model that enables us to analyze the running time of an algorithm in a computer with a...
The performance gap between processors and main memory has been growing over the last decades. Fast ...
Because of the infeasibility or expense of large fully-associative caches, cache memories are often ...
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification...