The cost of a cache miss depends heavily on the location of the main memory that backs the missing line. For certain applications, this cost is a major factor in overall performance. We report on the utility of OS-based page placement as a mechanism to increase the frequency with which cache fills access local memory in a distributed shared memory multiprocessor. Even with the very simple policy of first-use placement, we find significant improvements over round-robin placement for many applications on both hardware and software-coherent systems. For most of our applications, dynamic placement allows 35 to 75 percent of cache fills to be performed locally, resulting in performance improvements of 20 to 40 percent. We have also investigate...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
Shared last level cache has been widely used in modern multicore processors. However, uncontrolled c...
Next generation computer systems will have gigabytes of physical memory and processors in the 200 MI...
The cost of a cache miss depends heavily on the location of the main memory that backs the missing l...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
Several cache-coherent shared-memory multiprocessors have been developed that are scalable and offer...
Part 1: Systems, Networks and ArchitecturesInternational audienceHybrid cache architecture (HCA), wh...
There is a lack of flexibility in a fixed page location system because every location within a page ...
Static cache partitioning can reduce inter-application cache interference and improve the composite ...
This paper presents user-level dynamic page migration, a runtime technique which transparently enabl...
This thesis proposes a software-oriented distributed shared cache management approach for chip multi...
We present design details and some initial performance results of a novel scalable shared memory mul...
Phase-Change Memory (PCM) technology has received substantial attention recently. Because PCM is byt...
The performance of multiprogrammed shared-memory multiprocessors suffers often from scheduler interv...
This paper presents and studies a distributed L2 cache management approach through OS-level page all...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
Shared last level cache has been widely used in modern multicore processors. However, uncontrolled c...
Next generation computer systems will have gigabytes of physical memory and processors in the 200 MI...
The cost of a cache miss depends heavily on the location of the main memory that backs the missing l...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
Several cache-coherent shared-memory multiprocessors have been developed that are scalable and offer...
Part 1: Systems, Networks and ArchitecturesInternational audienceHybrid cache architecture (HCA), wh...
There is a lack of flexibility in a fixed page location system because every location within a page ...
Static cache partitioning can reduce inter-application cache interference and improve the composite ...
This paper presents user-level dynamic page migration, a runtime technique which transparently enabl...
This thesis proposes a software-oriented distributed shared cache management approach for chip multi...
We present design details and some initial performance results of a novel scalable shared memory mul...
Phase-Change Memory (PCM) technology has received substantial attention recently. Because PCM is byt...
The performance of multiprogrammed shared-memory multiprocessors suffers often from scheduler interv...
This paper presents and studies a distributed L2 cache management approach through OS-level page all...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
Shared last level cache has been widely used in modern multicore processors. However, uncontrolled c...
Next generation computer systems will have gigabytes of physical memory and processors in the 200 MI...