AbstractGPUs have recently attracted our attention as accelerators on a wide variety of algorithms, including assorted examples within the image analysis field. Among them, wavelets are gaining popularity as solid tools for data mining and video compression, though this comes at the expense of a high computational cost. After proving the effectiveness of the GPU for accelerating the 2D Fast Wavelet Transform [1], we present in this paper a novel implementation on manycore GPUs and multicore CPUs for a high performance computation of the 3D Fast Wavelet Transform (3DFWT). This algorithm poses a challenging access pattern on matrix operators demanding high sustainable bandwidth, as well as mathematical functions with remarkable arithmetic int...