We present a new framework for the restoration of missing samples in audio signals. It consists in locating audio frames that share similar sparse structures and in applying a joint-sparse algorithm to estimate the missing samples. Such similar frames are found in audio signals due to the signals' intrinsic structures: across channels, in the temporal neighboring of each frame and, since patterns are repeated non-locally. We propose a fast and robust strategy for locating the similar frames by introducing a spectral cosine similarity that is more suitable than the usual correlation similarity. We present and compare the inpainting versions of three known joint-sparse algorithms and show how they lead to a better reconstruction of the missin...