The need for high-performance computing and low-power operation has led to the emergence of new processor architectures, with most recent designs based on the combination of multiple cores and multiple threads per core. In our work, we are exploring an architecture of multiple instruction pipelines, which merge into a common back-end, formed as a network of functional units. We focus on the back-end in this paper, and in particular, on a rapid, low-power execution of loops, based on data flow. We dispatch the loop body instructions on the network of functional units only once, and we then let the loop execute in a dataflow manner, without any other instruction issue before loop completion. In this way, we do not only speed up the loop execu...
The trend to develop increasingly more intelligent systems leads directly to a considerable demand f...
Nowadays, we are reaching a point where further improving single thread performance can only be done...
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such...
Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelin...
Increased integration in the form of multiple processor cores on a single die, relatively constant d...
Current integration trends embrace the prosperity of single-chip multi-core processors. Although mul...
This dissertation demonstrates that through the careful application of hardware and software techniq...
Future exascale machines will require multi/many-core architectures able to energyciently run multi-...
\u3cp\u3eEnergy consumption in embedded systems is strongly dominated by instruction memory organiza...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel...
The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, t...
Embedded systems require maximum performance from a processor within significant constraints in powe...
Abstract—Mobile and PC/server class processor companies continue to roll out flagship core microarch...
The end of Dennard scaling leads to new research directions that try to cope with the utilization wa...
The trend to develop increasingly more intelligent systems leads directly to a considerable demand f...
Nowadays, we are reaching a point where further improving single thread performance can only be done...
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such...
Bypass delays are expected to grow beyond 1ns as technology scales. These delays necessitate pipelin...
Increased integration in the form of multiple processor cores on a single die, relatively constant d...
Current integration trends embrace the prosperity of single-chip multi-core processors. Although mul...
This dissertation demonstrates that through the careful application of hardware and software techniq...
Future exascale machines will require multi/many-core architectures able to energyciently run multi-...
\u3cp\u3eEnergy consumption in embedded systems is strongly dominated by instruction memory organiza...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel...
The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, t...
Embedded systems require maximum performance from a processor within significant constraints in powe...
Abstract—Mobile and PC/server class processor companies continue to roll out flagship core microarch...
The end of Dennard scaling leads to new research directions that try to cope with the utilization wa...
The trend to develop increasingly more intelligent systems leads directly to a considerable demand f...
Nowadays, we are reaching a point where further improving single thread performance can only be done...
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such...