With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies ...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
[Abstract] Manycore processors feature a high number of general-purpose cores designed to work in a...
Efficient fine-grain synchronization is a classic computer architecture challenge that has been prof...
Inability to hide main memory latency has been increasingly limiting the performance of modern proce...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for...
University of Technology Sydney. Faculty of Engineering and Information Technology.Chip Multi-Proces...
International audienceAsymmetric coherency is a new optimisation method for coherency policies to su...
Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor d...
As transistor density continues to grow geometrically, processor manufacturers are already able to p...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
The evolution of microprocessor design in the last few decades has changed significantly, moving fro...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
The performance of modern microprocessors is increasingly limited by their inability to hide main me...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
[Abstract] Manycore processors feature a high number of general-purpose cores designed to work in a...
Efficient fine-grain synchronization is a classic computer architecture challenge that has been prof...
Inability to hide main memory latency has been increasingly limiting the performance of modern proce...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for...
University of Technology Sydney. Faculty of Engineering and Information Technology.Chip Multi-Proces...
International audienceAsymmetric coherency is a new optimisation method for coherency policies to su...
Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor d...
As transistor density continues to grow geometrically, processor manufacturers are already able to p...
On the road to computer systems able to support the requirements of exascale applications, Chip Mult...
The evolution of microprocessor design in the last few decades has changed significantly, moving fro...
Maximal utilization of cores in multicore architectures is key to realize the potential performance ...
The performance of modern microprocessors is increasingly limited by their inability to hide main me...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
[Abstract] Manycore processors feature a high number of general-purpose cores designed to work in a...