A large number of algorithms for multidimensional signals processing and scientific computation come in the form of iterative stencil loops (ISLs), whose data dependencies span across multiple iterations. Because of their complex inner structure, automatic hardware acceleration of such algorithms is traditionally considered as a difficult task. In this paper, we introduce an automatic design flow that identifies, in a wide family of bidimensional data processing algorithms, sub-portions that exhibit a kind of parallelism close to that of ISLs; these are mapped onto a space of highly optimized ad-hoc architectures, which is efficiently explored to identify the best implementations with respect to both area and throughput. Experimental resul...
International audienceHeterogeneous architectures have been widely used in the domain of high perfor...
International audienceRecent increase in the complexity of the circuits has brought high-level synth...
The Texas Instruments C66x Digital Signal Processor (DSP) is an embedded processor technology that i...
A large number of algorithms for multidimensional signals processing and scientific computation come...
A large number of algorithms for multidimensional signals processing and scientific computation come...
The automatic generation of hardware implementations for a given algorithm is generally a difficult ...
Stencil computations are array based algorithms that apply a computation to all array elements in a ...
International audienceIn this paper we propose a design template for stencil computations targeting ...
Traditionally, parallel implementations of multimedia algorithms are carried out manually, since the...
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
Hardware acceleration is the use of custom hardware architectures to perform some computations faste...
Iterative stencils represent the core computational kernel of many applications belonging to differe...
AbstractIn this paper we investigate how stencil computations can be implemented on state-of-the-art...
In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel progr...
International audienceHeterogeneous architectures have been widely used in the domain of high perfor...
International audienceRecent increase in the complexity of the circuits has brought high-level synth...
The Texas Instruments C66x Digital Signal Processor (DSP) is an embedded processor technology that i...
A large number of algorithms for multidimensional signals processing and scientific computation come...
A large number of algorithms for multidimensional signals processing and scientific computation come...
The automatic generation of hardware implementations for a given algorithm is generally a difficult ...
Stencil computations are array based algorithms that apply a computation to all array elements in a ...
International audienceIn this paper we propose a design template for stencil computations targeting ...
Traditionally, parallel implementations of multimedia algorithms are carried out manually, since the...
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-...
The key common bottleneck in most stencil codes is data movement, and prior research has shown that ...
Hardware acceleration is the use of custom hardware architectures to perform some computations faste...
Iterative stencils represent the core computational kernel of many applications belonging to differe...
AbstractIn this paper we investigate how stencil computations can be implemented on state-of-the-art...
In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel progr...
International audienceHeterogeneous architectures have been widely used in the domain of high perfor...
International audienceRecent increase in the complexity of the circuits has brought high-level synth...
The Texas Instruments C66x Digital Signal Processor (DSP) is an embedded processor technology that i...