Reconfigurable Architectures are good candidates for application accelerators that cannot be set in stone at pro-duction time. FPGAs however, often suffer from the area and performance penalty intrinsic in gate-level reconfig-urability. To reduce this overhead, coarse-grained reconfig-urable arrays (CGRAs) are reconfigurable at the ALU level, but a successful design needs more than computational power—the main bottleneck usually being memory trans-fers. Just like the integration of hardwired multiplier and memory blocks enabled FPGAs to efficiently implement dig-ital signal processing applications, in this paper we study a customizable architecture template based on heterogeneous processing elements (multipliers, ALU clusters and memo-ries)...