In this work we reduce interconnect power dissipation in Symmetric Multiprocessors or SMPs. We revisit snoopy cache coherence protocols and reduce unnecessary interconnect activity by speculating nodes expected to provide a missing data. Conventional snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We show that it is possible to reduce the associated power dissipation if such requests are broadcasted selectively and only to nodes more likely to provide the missing data. We reduce power as we limit access only to the interconnect components between the requester and the supplier node. We evaluate our technique using shared...
Many-core architectures provide an efficient way of harnessing the increasing numbers of transistors...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
With transistor miniaturization leading to an abundance of on-chip resources and uniprocessor design...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
This paper presents a low overhead, high performance cache coherence protocol designed to exploit hi...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
Multicore systems have reached a stage where they are inevitable in the embedded world. This transit...
Integrating more processor cores on-die has become the unanimous trend in the microprocessor industr...
Many-core architectures provide an efficient way of harnessing the growing numbers of transistors av...
Abstract—This paper evaluates several techniques to save leakage in CMP L2 caches by selectively swi...
We present a novel methodology for power reduction in embedded multiprocessor systems. Maintaining l...
This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching of...
Many-core architectures provide an efficient way of harnessing the increasing numbers of transistors...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
With transistor miniaturization leading to an abundance of on-chip resources and uniprocessor design...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Design complexity and limited power budget are causing the number of cores on the same chip to grow ...
This paper presents a low overhead, high performance cache coherence protocol designed to exploit hi...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
Multicore systems have reached a stage where they are inevitable in the embedded world. This transit...
Integrating more processor cores on-die has become the unanimous trend in the microprocessor industr...
Many-core architectures provide an efficient way of harnessing the growing numbers of transistors av...
Abstract—This paper evaluates several techniques to save leakage in CMP L2 caches by selectively swi...
We present a novel methodology for power reduction in embedded multiprocessor systems. Maintaining l...
This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching of...
Many-core architectures provide an efficient way of harnessing the increasing numbers of transistors...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...