We present a translation from programs expressed in a functional IR into dataflow networks as an intermediate step within a Haskell-to-Hardware compiler. Our networks exploit pipeline parallelism, particularly across multiple tail-recursive calls, via non-strict function evaluation. To handle the long-latency memory operations common to our target applications, we employ a latency-insensitive methodology that ensures arbitrary delays do not change the functionality of the circuit. We present empirical results comparing our networks against their strict counterparts, showing that nonstrictness can mitigate small increases in memory latency and improve overall performance by up to 2x
Domain-specific acceleration is now a “must” for all the computing spectrum, going from high perform...
A method for assessing the benefits of fine-grain paral-lelism in "real " programs is pres...
Abstraction in hardware description languages stalled at the register-transfer level decades ago, ye...
To provide high performance at practical power levels, tomorrow’s chips will have to consist primari...
Abstract: Dataflow Architectures have been explored extensively in the past and are now re-evaluated...
A possible direction for exploiting the computational power of multi/many core chips is to rely on a...
We present a technique for implementing dataflow networks as compositional hardware circuits. We fir...
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effectiv...
High-Level Synthesis (HLS) tools generate hardware designs from high-level programming languages. Th...
In the streaming domain, applications are often described as dataflow graphs. Each node in the graph...
Abstract—In this paper we show how a simple dataflow processor can be fully implemented using CλaSH,...
Recursive functions and data types pose significant challenges for a Haskell-to-hardware compiler. D...
International audienceDomain-specific acceleration is now a "must" for all the computing spectrum, g...
The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven mode...
International audienceIn the context of multi-core processors and the trend toward many-core, datafl...
Domain-specific acceleration is now a “must” for all the computing spectrum, going from high perform...
A method for assessing the benefits of fine-grain paral-lelism in "real " programs is pres...
Abstraction in hardware description languages stalled at the register-transfer level decades ago, ye...
To provide high performance at practical power levels, tomorrow’s chips will have to consist primari...
Abstract: Dataflow Architectures have been explored extensively in the past and are now re-evaluated...
A possible direction for exploiting the computational power of multi/many core chips is to rely on a...
We present a technique for implementing dataflow networks as compositional hardware circuits. We fir...
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effectiv...
High-Level Synthesis (HLS) tools generate hardware designs from high-level programming languages. Th...
In the streaming domain, applications are often described as dataflow graphs. Each node in the graph...
Abstract—In this paper we show how a simple dataflow processor can be fully implemented using CλaSH,...
Recursive functions and data types pose significant challenges for a Haskell-to-hardware compiler. D...
International audienceDomain-specific acceleration is now a "must" for all the computing spectrum, g...
The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven mode...
International audienceIn the context of multi-core processors and the trend toward many-core, datafl...
Domain-specific acceleration is now a “must” for all the computing spectrum, going from high perform...
A method for assessing the benefits of fine-grain paral-lelism in "real " programs is pres...
Abstraction in hardware description languages stalled at the register-transfer level decades ago, ye...