During the last years, the computing performance increased for basically all integrated digital circuits, including FPGAs. They contain more configurable logic blocks, more memory, and more dedicated computing resources like DSP blocks. Thus, FPGAs offer a high degree of fine grained parallelism that cannot be reached with classic SIMD processors like GPUs. Furthermore, their power consumption is usually much lower than for GPUs making them suitable for embedded applications. However, this enormous computing power is a trade-off with more complex and demanding development as well as long synthesis times. The first is nowadays targeted by HLS tools that simplify the problem formulation. Instead of VHDL or Verilog code a higher level langu...