Techniques to evaluate a program’s cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is proportional to the number of memory accesses performed by the program under analysis. 2. Relying on implicit workload characterizations such as the polyhedral model, analytical approaches often achieve problem-size-independent runtimes, but so far have been limited to idealized cache models. We introduce a hybrid approach, warping cache simulation, that aims to achieve applicability to real-world cache models and problem-size-independent runtimes. As prior analytical approaches, we focus on programs in ...
International audienceTwo concurrent factors challenge the evaluation of large-scale cache networks:...
We present an efficient runtime cache to accelerate the display of procedurally displaced and textur...
In this paper we present a method for determining the cache performance of the loop nests in a progr...
Techniques to evaluate a program’s cache performance fall into two camps: 1. Traditional trace-base...
This thesis presents a generic approach towards compiling fast execution-driven simulators, and appl...
This paper presents a generic approach for compiling fast execution-driven simulators, and applies t...
Because of the infeasibility or expense of large fully-associative caches, cache memories are often ...
We present a new technique for the parallel simulation of cache coherent shared memory multiprocess...
We present a cache performance modeling methodology that facilitates the tuning of uniprocessor cach...
Cache behavior is complex and inherently unstable, yet it is a critical factor affecting program per...
Computers become increasingly complex. Current and future systems feature configurable hardware, mul...
We describe novel techniques used for efficient simulation of memory in SimICS, an instruction leve...
Application performance on computer processors depends on a number of complex architectural and micr...
To increase performance, modern processors employ complex techniques such as out-of-order pipelines ...
As multiprocessor systems-on-chip become a reality, perfor-mance modeling becomes a challenge. To qu...
International audienceTwo concurrent factors challenge the evaluation of large-scale cache networks:...
We present an efficient runtime cache to accelerate the display of procedurally displaced and textur...
In this paper we present a method for determining the cache performance of the loop nests in a progr...
Techniques to evaluate a program’s cache performance fall into two camps: 1. Traditional trace-base...
This thesis presents a generic approach towards compiling fast execution-driven simulators, and appl...
This paper presents a generic approach for compiling fast execution-driven simulators, and applies t...
Because of the infeasibility or expense of large fully-associative caches, cache memories are often ...
We present a new technique for the parallel simulation of cache coherent shared memory multiprocess...
We present a cache performance modeling methodology that facilitates the tuning of uniprocessor cach...
Cache behavior is complex and inherently unstable, yet it is a critical factor affecting program per...
Computers become increasingly complex. Current and future systems feature configurable hardware, mul...
We describe novel techniques used for efficient simulation of memory in SimICS, an instruction leve...
Application performance on computer processors depends on a number of complex architectural and micr...
To increase performance, modern processors employ complex techniques such as out-of-order pipelines ...
As multiprocessor systems-on-chip become a reality, perfor-mance modeling becomes a challenge. To qu...
International audienceTwo concurrent factors challenge the evaluation of large-scale cache networks:...
We present an efficient runtime cache to accelerate the display of procedurally displaced and textur...
In this paper we present a method for determining the cache performance of the loop nests in a progr...