Recent evidence indicates that the exploitation of locality in dataflow programs could have a dramatic impact on performance. The current trend in the design of dataflow processors suggest a synthesis of traditional non-strict fine grain instruction execution and a strict coarse grain execution in order to exploit locality. While an increase in instruction granularity will favor the exploitation of locality within a single execution thread, the resulting grain size may increase latency among execution threads. In this paper, the resulting latency incurred through the partitioning of fine grain instructions into coarser grain threads is evaluated. We define the concept of a cluster of fine grain instructions to quantify coarse grain input an...
Abstract—Implementing locality-aware scheduling algo-rithms using fine-programming models may genera...
To efficiently utilize the emerging heterogeneous multi-core architecture, it is essential to exploi...
Although they are powerful intermediate representations for compilers, pure dataflow graphs are inco...
A method for assessing the benefits of fine-grain paral-lelism in "real " programs is pres...
This paper presents an evaluation of our Scheduled Dataflow (SDF) Processor. Recent focus in the fie...
Current computing systems are mostly focused on achieving performance, programmability, energy effic...
In this paper the Scheduled Dataflow (SDF) architecture - a decoupled memory/execution, multithreade...
Science and Engineering advancements depend more and more on computational simulations. These simula...
Dataflow-based fine-grain parallel data-structures provide high-level abstraction to easily write pr...
Abstract. Increasing on-chip wire delay along with the distributed nature of processing elements, ma...
In this paper we describe a new approach to designing multithreaded architecture that can be used as...
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effectiv...
The path towards future high performance computers requires architectures able to efficiently run mu...
Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational wo...
The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven mode...
Abstract—Implementing locality-aware scheduling algo-rithms using fine-programming models may genera...
To efficiently utilize the emerging heterogeneous multi-core architecture, it is essential to exploi...
Although they are powerful intermediate representations for compilers, pure dataflow graphs are inco...
A method for assessing the benefits of fine-grain paral-lelism in "real " programs is pres...
This paper presents an evaluation of our Scheduled Dataflow (SDF) Processor. Recent focus in the fie...
Current computing systems are mostly focused on achieving performance, programmability, energy effic...
In this paper the Scheduled Dataflow (SDF) architecture - a decoupled memory/execution, multithreade...
Science and Engineering advancements depend more and more on computational simulations. These simula...
Dataflow-based fine-grain parallel data-structures provide high-level abstraction to easily write pr...
Abstract. Increasing on-chip wire delay along with the distributed nature of processing elements, ma...
In this paper we describe a new approach to designing multithreaded architecture that can be used as...
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effectiv...
The path towards future high performance computers requires architectures able to efficiently run mu...
Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational wo...
The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven mode...
Abstract—Implementing locality-aware scheduling algo-rithms using fine-programming models may genera...
To efficiently utilize the emerging heterogeneous multi-core architecture, it is essential to exploi...
Although they are powerful intermediate representations for compilers, pure dataflow graphs are inco...