Current generations of FPGAs create possibilities for innovative, application-specific computation pipelines. In many cases, the pipeline can fully exploit the FPGA’s parallelism only when multiple operands are available concurrently, requiring clusters of values to be fetched from memory. These clusters of values often have fixed organization, as in the eight grid points around an off-grid position that are needed for 3D interpolation of a value at that position. We present a technique for creating custom interleaving of the FPGA’s on-chip memories, giving access to the entire cluster of values in one memory cycle. This technique works on grids of 2, 3, or more dimensions, on many non-rectangular grids, and on cluster organization specific...
Some data- and compute-intensive applications can be ac-celerated by offloading portions of codes to...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
FPGA designs have an immense design space, and there can be an order of magnitude performance differ...
Many compute-intensive applications generate single result values by accessing clusters of nearby po...
The Legup High-Level Synthesis (HLS) tool permits the synthesis of multi-threaded software into para...
This paper proposes an algorithm for mappinglogical to physical memory resources on Field-Programmab...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
Abstract—Developing FPGA implementations with an input specification in a high-level programming lan...
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming ...
The high potential performance of FPGAs cannot be exploited if a design suffers a memory bottleneck....
Reconfigurable systems, and in particular, FPGA-based custom computing machines, offer a unique oppo...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many ca...
With the large resource densities available on modern FPGAs it is often the available memory bandwid...
Abstract—The capacity of FPGA devices has reached the 1-million-LUT level, which provides space to a...
Some data- and compute-intensive applications can be ac-celerated by offloading portions of codes to...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
FPGA designs have an immense design space, and there can be an order of magnitude performance differ...
Many compute-intensive applications generate single result values by accessing clusters of nearby po...
The Legup High-Level Synthesis (HLS) tool permits the synthesis of multi-threaded software into para...
This paper proposes an algorithm for mappinglogical to physical memory resources on Field-Programmab...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
Abstract—Developing FPGA implementations with an input specification in a high-level programming lan...
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming ...
The high potential performance of FPGAs cannot be exploited if a design suffers a memory bottleneck....
Reconfigurable systems, and in particular, FPGA-based custom computing machines, offer a unique oppo...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many ca...
With the large resource densities available on modern FPGAs it is often the available memory bandwid...
Abstract—The capacity of FPGA devices has reached the 1-million-LUT level, which provides space to a...
Some data- and compute-intensive applications can be ac-celerated by offloading portions of codes to...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
FPGA designs have an immense design space, and there can be an order of magnitude performance differ...