The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus...
Accelerators are becoming key elements of computing platforms for both data centers and mobile devic...
Fast and energy efficient processing of data has always been a key requirement in processor design. ...
Abstract-Effective exploitation of the application-specific parallel patterns and computation operat...
The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. ...
High Level Synthesis tools have reduced accelerator design time. However, a complex scaling problem ...
High Level Synthesis tools have reduced accelerator design time. How-ever, a complex scaling problem...
In modern system-on-chip architectures, specialized accelerators are increasingly used to improve pe...
The world needs special-purpose accelerators to meet future constraints on computation and power con...
The design of specialized accelerators is essential to the success of many modern Systems-on-Chip. E...
This dissertation investigates the communication optimization for customizable domain-specific compu...
High-level synthesis (HLS) is well capable of generating control and computation circuits for FPGA a...
The demand for high performance has driven acyclic computation accelerators into extensive use in mo...
In light of the failure of Dennard scaling and recent slowdown of Moore's Law, both industry and aca...
There is a large, emerging, and commercially relevant class of applications which stands to be enabl...
Accelerators are becoming key elements of computing platforms for both data centers and mobile devic...
Fast and energy efficient processing of data has always been a key requirement in processor design. ...
Abstract-Effective exploitation of the application-specific parallel patterns and computation operat...
The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. ...
High Level Synthesis tools have reduced accelerator design time. However, a complex scaling problem ...
High Level Synthesis tools have reduced accelerator design time. How-ever, a complex scaling problem...
In modern system-on-chip architectures, specialized accelerators are increasingly used to improve pe...
The world needs special-purpose accelerators to meet future constraints on computation and power con...
The design of specialized accelerators is essential to the success of many modern Systems-on-Chip. E...
This dissertation investigates the communication optimization for customizable domain-specific compu...
High-level synthesis (HLS) is well capable of generating control and computation circuits for FPGA a...
The demand for high performance has driven acyclic computation accelerators into extensive use in mo...
In light of the failure of Dennard scaling and recent slowdown of Moore's Law, both industry and aca...
There is a large, emerging, and commercially relevant class of applications which stands to be enabl...
Accelerators are becoming key elements of computing platforms for both data centers and mobile devic...
Fast and energy efficient processing of data has always been a key requirement in processor design. ...
Abstract-Effective exploitation of the application-specific parallel patterns and computation operat...