We present a set-associative page cache for scalable parallelism of IOPS in multicore systems. The design eliminates lock contention and hardware cache misses by partitioning the global cache into many independent page sets, each requiring a small amount of metadata that fits in few processor cache lines. We extend this design with message passing among processors in a non-uniform memory architecture (NUMA). We evaluate the set-associative cache on 12-core processors and a 48-core NUMA to show that it realizes the scalable IOPS of direct I/O (no caching) and matches the cache hits rates of Linux’s page cache. Set-associative caching maintains IOPS at scale in contrast to Linux for which IOPS crash beyond eight parallel threads.
Directly mapped caches are an attractive option for processor designers as they combine fast lookup ...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
We present a set-associative page cache for scalable parallelism of IOPS in multicore systems. The d...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
With rapidly evolving technology, multicore and manycore processors have emerged as promising archit...
We introduce a new organization for multi-bank caches: the skewed-associative cache. A two-way skewe...
Shared last level cache has been widely used in modern multicore processors. However, uncontrolled c...
As the number of cores increases in both incoming and future shared-memory chip--multiprocessor (CMP...
This thesis studies the use of software methods to improve memory performance in a heterogeneous cac...
Cache memory is one of the most important components of a computer system. The cache allows quickly...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page t...
Abstract — While higher associativities are common at L-2 or Last-Level cache hierarchies, direct-ma...
We present design details and some initial performance results of a novel scalable shared memory mul...
Directly mapped caches are an attractive option for processor designers as they combine fast lookup ...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
We present a set-associative page cache for scalable parallelism of IOPS in multicore systems. The d...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
With rapidly evolving technology, multicore and manycore processors have emerged as promising archit...
We introduce a new organization for multi-bank caches: the skewed-associative cache. A two-way skewe...
Shared last level cache has been widely used in modern multicore processors. However, uncontrolled c...
As the number of cores increases in both incoming and future shared-memory chip--multiprocessor (CMP...
This thesis studies the use of software methods to improve memory performance in a heterogeneous cac...
Cache memory is one of the most important components of a computer system. The cache allows quickly...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page t...
Abstract — While higher associativities are common at L-2 or Last-Level cache hierarchies, direct-ma...
We present design details and some initial performance results of a novel scalable shared memory mul...
Directly mapped caches are an attractive option for processor designers as they combine fast lookup ...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...