To maximize the benefit and minimize the overhead of software-based latency tolerance techniques, we would like to apply them precisely to the set of dynamic references that suffer cache misses. Unfortunately, the information provided by the state-of-theart cache miss profiling technique (summary profiling) is inadequate for references with intermediate miss ratios---it results in either failing to hide latency, or else inserting unnecessary overhead. To overcome this problem, we propose and evaluate a new technique--- correlation profiling---which improves predictability by correlating the caching behavior with the associated dynamic context. Our experimental results demonstrate that roughly half of the 22 non-numeric applications we stu...
Prior knowledge of the target application leads to new optimization and customization opportunities ...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
The most important processor performance bottleneck is the ever-increasing gap between the memory an...
Applications often under-utilize cache space and there are no software locality optimization techniq...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
While caches have become invaluable for higher-end architectures due to their ability to hide, in pa...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
Trace caches are used to help dynamic branch prediction make multiple predictions in a cycle by embe...
As the processor-memory performance gap continues to grow, so does the need for effective tools and ...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
Address correlation is a technique that links the addresses that reference the same data values. Usi...
With the software applications increasing in complexity, description of hardware is becoming increas...
Prior knowledge of the target application leads to new optimization and customization opportunities ...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
The most important processor performance bottleneck is the ever-increasing gap between the memory an...
Applications often under-utilize cache space and there are no software locality optimization techniq...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
While caches have become invaluable for higher-end architectures due to their ability to hide, in pa...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
Trace caches are used to help dynamic branch prediction make multiple predictions in a cycle by embe...
As the processor-memory performance gap continues to grow, so does the need for effective tools and ...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
Address correlation is a technique that links the addresses that reference the same data values. Usi...
With the software applications increasing in complexity, description of hardware is becoming increas...
Prior knowledge of the target application leads to new optimization and customization opportunities ...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
grantor: University of TorontoThe latency of accessing instructions and data from the memo...