As hardware parallelism continues to increase, CPU caches can no longer be considered a transparent, hardware-level performance optimization. Adverse cache impact on performance is entirely workload-dependent and may depend on runtime factors. The operating system must begin to treat CPU caches like any other shared hardware resource to effectively support workloads on parallel hardware. We present a binary translation system called Cachekata that provides a byte-granular memory remapping facility within the OS in an efficient manner. Cachekata is incorporated into a larger system, Plastic, which diagnoses and corrects instances of false sharing occurring within running applications. Our implementation is able to achieve a 3-6x performance ...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Contention for shared cache resources has been recognized as a major bottleneck for multicores—espec...
Three methods to maintain translation lookaside buffer (TLB) consistency in highly-parallel, shared-...
As hardware parallelism continues to increase, CPU caches can no longer be considered a transparent,...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
Cache becomes very important in high-load computer application. In a web application, cache can impr...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
Cache injection is a viable technique to improve the performance of data-intensive parallel applicat...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
Recently, reconfigurable architectures, which outperform DSP processors, have become important. Alth...
Minimizing power, increasing performance, and delivering effective memory bandwidth are today's prim...
Dynamic binary translation systems enable a wide range of applications such as program instrumentati...
Introduction As the microprocessor industry struggles to deliver higher performance superscalar and...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Contention for shared cache resources has been recognized as a major bottleneck for multicores—espec...
Three methods to maintain translation lookaside buffer (TLB) consistency in highly-parallel, shared-...
As hardware parallelism continues to increase, CPU caches can no longer be considered a transparent,...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
Cache becomes very important in high-load computer application. In a web application, cache can impr...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
Cache injection is a viable technique to improve the performance of data-intensive parallel applicat...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
Recently, reconfigurable architectures, which outperform DSP processors, have become important. Alth...
Minimizing power, increasing performance, and delivering effective memory bandwidth are today's prim...
Dynamic binary translation systems enable a wide range of applications such as program instrumentati...
Introduction As the microprocessor industry struggles to deliver higher performance superscalar and...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Contention for shared cache resources has been recognized as a major bottleneck for multicores—espec...
Three methods to maintain translation lookaside buffer (TLB) consistency in highly-parallel, shared-...