Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Dataflow compute models generate highly-structured communication workloads from messages propagating along graph edges. We can statially expose this structure to traffic compilers and optimization tools to reshape and reduce traffic for higher performance (or lower area, lower energy, lower cost). Such offline traffic optimization eliminates the need for complex, runtime NoC hardware and enables lightweight, scalable NoCs. We perform load balancing, placement, fanout routing, and fine-grained synchronization to optimize our workloads for large networks up to 2025 parallel elements for BSP model and 25 parallel elements for Token Dataflow. This al...
Abstract—In this paper, a NoC traffic monitoring method is proposed for billion cycle application de...
International audienceAs the key interconnection technique of System on Chip (SoC), Network on Chip ...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
FPGA-based soft processors customized for operations on sparse graphs can deliver significant perfor...
Sparse graph problems are notoriously hard to accelerate on conventional platforms due to irregular ...
Graduation date: 2017General-purpose Graphics Processing Units (GPGPUs) have become a critical compo...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
As benchmark programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are e...
International audienceThe ever increasing density of integration makes the NoC a relevant communicat...
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single ch...
2018-10-16Graph analytics has drawn much research interest because of its broad applicability from m...
How do we develop programs that are easy to express, easy to reason about, and able to achieve high ...
Abstract — Many important applications are organized around long-lived, irregular sparse graphs (e.g...
Abstract—In this paper, a NoC traffic monitoring method is proposed for billion cycle application de...
International audienceAs the key interconnection technique of System on Chip (SoC), Network on Chip ...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Data...
FPGA-based soft processors customized for operations on sparse graphs can deliver significant perfor...
Sparse graph problems are notoriously hard to accelerate on conventional platforms due to irregular ...
Graduation date: 2017General-purpose Graphics Processing Units (GPGPUs) have become a critical compo...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...
As benchmark programs for microprocessor architectures, network-on-chip (NoC) traffic patterns are e...
International audienceThe ever increasing density of integration makes the NoC a relevant communicat...
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single ch...
2018-10-16Graph analytics has drawn much research interest because of its broad applicability from m...
How do we develop programs that are easy to express, easy to reason about, and able to achieve high ...
Abstract — Many important applications are organized around long-lived, irregular sparse graphs (e.g...
Abstract—In this paper, a NoC traffic monitoring method is proposed for billion cycle application de...
International audienceAs the key interconnection technique of System on Chip (SoC), Network on Chip ...
Abstract—As the number of cores and threads in manycore compute accelerators such as Graphics Proces...