This paper discusses the balance between loop-level parallelism and clock rate for enhancing the performance of DSP applications fully implemented on FPGAs. Loop-level parallelism reduces the total cycles of an application at the cost of increased routing complexity that often results in lower clock rates. We analyze loops that can be fully parallelized and show that it is possible to achieve better performance by controlling the number of parallel iterations of the loops than using fully parallel loops. We have implemented loop parallelism in our compilation framework and fine-tune them to enhance the performance of DSP applications that target Xilinx Virtex-II FPGA chip. Our experimental results show that it is possible to reach a perform...
Two ways to exploit chips with a very large number of transistors are multicore processors and progr...
Modern embedded systems for DSP applications are increasingly being implemented on heterogeneous pro...
Applications that require digital signal processing (DSP) functions are typically mapped onto genera...
The embedded DSP blocks in modern Field Programmable Gate Arrays (FPGAs) are highly capable and supp...
Where do all the cycles go when microprocessor applications are implemented spatially as circuits on...
AbstractThe sophistication of applications and hunger for high quality digital data demands increase...
Field-programmable gate arrays represent an army of logical units which can be organized in a highly...
Field Programmable Gate Array (FPGA) provides the ability to use, and re-use, hardware with minimal ...
With the large resource densities available on modern FPGAs it is often the available memory bandwid...
This paper shows how temporal parallelism has an important role in the power dissipation reduction i...
Embedded systems require maximum performance from a processor within significant constraints in powe...
The usage of high-level synthesis (HLS) tools for FPGAs has increased significantly over the last ye...
Abstract—Placement of a large FPGA design now commonly requires several hours, significantly hinderi...
Looping operations impose a significant bottleneck to higher execution performance in embedded appli...
Coarse-grained FPGA overlays built around the runtime programmable DSP blocks in modern FPGAs can ac...
Two ways to exploit chips with a very large number of transistors are multicore processors and progr...
Modern embedded systems for DSP applications are increasingly being implemented on heterogeneous pro...
Applications that require digital signal processing (DSP) functions are typically mapped onto genera...
The embedded DSP blocks in modern Field Programmable Gate Arrays (FPGAs) are highly capable and supp...
Where do all the cycles go when microprocessor applications are implemented spatially as circuits on...
AbstractThe sophistication of applications and hunger for high quality digital data demands increase...
Field-programmable gate arrays represent an army of logical units which can be organized in a highly...
Field Programmable Gate Array (FPGA) provides the ability to use, and re-use, hardware with minimal ...
With the large resource densities available on modern FPGAs it is often the available memory bandwid...
This paper shows how temporal parallelism has an important role in the power dissipation reduction i...
Embedded systems require maximum performance from a processor within significant constraints in powe...
The usage of high-level synthesis (HLS) tools for FPGAs has increased significantly over the last ye...
Abstract—Placement of a large FPGA design now commonly requires several hours, significantly hinderi...
Looping operations impose a significant bottleneck to higher execution performance in embedded appli...
Coarse-grained FPGA overlays built around the runtime programmable DSP blocks in modern FPGAs can ac...
Two ways to exploit chips with a very large number of transistors are multicore processors and progr...
Modern embedded systems for DSP applications are increasingly being implemented on heterogeneous pro...
Applications that require digital signal processing (DSP) functions are typically mapped onto genera...