Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Dataflow compute models generate highly-structured communication workloads from messages propagating along graph edges. We can statially expose this structure to traffic compilers and optimization tools to reshape and reduce traffic for higher performance (or lower area, lower energy, lower cost). Such offline traffic optimization eliminates the need for complex, runtime NoC hardware and enables lightweight, scalable NoCs. We perform load balancing, placement, fanout routing, and fine-grained synchronization to optimize our workloads for large networks up to 2025 parallel elements for BSP model and 25 parallel elements for Token Dataflow. This al...
Sparse graph problems are notoriously hard to accelerate on conventional platforms due to irregular ...
As benchmark programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are e...
The increasing popularity of deep neural network (DNN) applications demands high computing power and...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
Dataflow Coprocessor Overlay (DaCO) is an FPGA-tuned dataflow-driven overlay architecture that offer...
The scaling of semiconductor technologies is leading to processors with increasing numbers of cores....
The scaling of MOS transistors into the nanometer regime opens the possibility for creating large Ne...
AbstractThe scaling of semiconductor technologies is leading to processors with increasing numbers o...
Journal ArticleOur work reduces power consumption by minimizing wirelength and hop-count of an asyn...
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single ch...
In the last decade, Networks-on-Chips became the leading edge technology due to the growing requirem...
International audienceWe extend the state-of-the-art DSPIN network-on-chip architecture by defining ...
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single ch...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Sparse graph problems are notoriously hard to accelerate on conventional platforms due to irregular ...
As benchmark programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are e...
The increasing popularity of deep neural network (DNN) applications demands high computing power and...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
Dataflow Coprocessor Overlay (DaCO) is an FPGA-tuned dataflow-driven overlay architecture that offer...
The scaling of semiconductor technologies is leading to processors with increasing numbers of cores....
The scaling of MOS transistors into the nanometer regime opens the possibility for creating large Ne...
AbstractThe scaling of semiconductor technologies is leading to processors with increasing numbers o...
Journal ArticleOur work reduces power consumption by minimizing wirelength and hop-count of an asyn...
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single ch...
In the last decade, Networks-on-Chips became the leading edge technology due to the growing requirem...
International audienceWe extend the state-of-the-art DSPIN network-on-chip architecture by defining ...
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single ch...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Sparse graph problems are notoriously hard to accelerate on conventional platforms due to irregular ...
As benchmark programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are e...
The increasing popularity of deep neural network (DNN) applications demands high computing power and...