Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in some application domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to h...
On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
Future manycore Systems-on-Chip will integrate tens or even hundreds of cores. Tiled architectures h...
Multicast communication is a frequently invoked communication pattern in many parallel algorithms. A...
This paper presents efficient algorithms to implement multicast communication in scalable, wormhole-...
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming ...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
The recent emergence of large-scale knowledge discovery, data mining and social network analysis, ir...
Multicast is an important collective operation for parallel programs. Some Network Interface Cards (...
[[abstract]]High bandwidth and low latency switches are commercially available. Using these switches...
[[abstract]]©1997 World Scientific-High bandwidth and low latency switches are commercially availabl...
Abstract—To efficiently use multicore processors we need to ensure that almost all data communicatio...
The difference between emerging many-core architectures and their multi-core predecessors goes beyon...
Shared memory is the most popular parallel programming model for multi-core processors, while messag...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
Future manycore Systems-on-Chip will integrate tens or even hundreds of cores. Tiled architectures h...
Multicast communication is a frequently invoked communication pattern in many parallel algorithms. A...
This paper presents efficient algorithms to implement multicast communication in scalable, wormhole-...
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming ...
If the trend of integrating more and more cores to a single die continues, general-purpose processor...
The recent emergence of large-scale knowledge discovery, data mining and social network analysis, ir...
Multicast is an important collective operation for parallel programs. Some Network Interface Cards (...
[[abstract]]High bandwidth and low latency switches are commercially available. Using these switches...
[[abstract]]©1997 World Scientific-High bandwidth and low latency switches are commercially availabl...
Abstract—To efficiently use multicore processors we need to ensure that almost all data communicatio...
The difference between emerging many-core architectures and their multi-core predecessors goes beyon...
Shared memory is the most popular parallel programming model for multi-core processors, while messag...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it...
Technology trends suggest that future machines will relyon parallelism to meet increasing performanc...
Future manycore Systems-on-Chip will integrate tens or even hundreds of cores. Tiled architectures h...