Contention for shared memory, in the forms of true sharing and false sharing, is a challenging performance bug to discover and to repair. Understanding cache contention requires global knowledge of the program\u27s actual sharing behavior, and can even arise invisibly in the program due to the opaque decisions of the memory allocator. Previous schemes have focused only on false sharing, and impose significant performance penalties or require non-trivial alterations to the operating system or runtime system environment. This paper presents the Light, Accurate Sharing dEtection and Repair (LASER) system, which leverages new performance counter capabilities available on Intel\u27s Haswell architecture that identify the source of expensive cach...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...
Software distributed shared memory (DSM) platforms on networks of workstations tolerate large networ...
International audienceThis paper analyzes the sources of performance losses in hardware transactiona...
International audienceThis work explores the fault tolerance of successive approximation algorithms,...
The Last-level cache (LLC) is one of the main GPU’s shared resources that contributes to improve per...
False sharing is a notorious performance problem that may occur in multithreaded programs when they ...
False sharing (FS) is a well-known problem occurring in multiprocessor systems. It results in perfor...
Journal ArticleFor a parallel architecture to scale effectively, communication latency between proce...
posterIn chip multiprocessors, replication of cache lines is allowed to reduce the latency each cor...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
Multi-core computers are infamous for being hard to use in time-critical systems due to execution-ti...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...
Software distributed shared memory (DSM) platforms on networks of workstations tolerate large networ...
International audienceThis paper analyzes the sources of performance losses in hardware transactiona...
International audienceThis work explores the fault tolerance of successive approximation algorithms,...
The Last-level cache (LLC) is one of the main GPU’s shared resources that contributes to improve per...
False sharing is a notorious performance problem that may occur in multithreaded programs when they ...
False sharing (FS) is a well-known problem occurring in multiprocessor systems. It results in perfor...
Journal ArticleFor a parallel architecture to scale effectively, communication latency between proce...
posterIn chip multiprocessors, replication of cache lines is allowed to reduce the latency each cor...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
Multi-core computers are infamous for being hard to use in time-critical systems due to execution-ti...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...