Whenever a multimedia content is shared on the Internet, a mutation process is being operated by multiple users that download, alter and repost a modified version of the original data leading to the diffusion of multiple near-duplicate copies. This effect is also experienced by audio data (e.g., in audio sharing platforms) and requires the design of accurate phylogenetic analysis strategies that permit uncovering the processing history of each copy and identify the original one. This paper proposes a new phylogenetic reconstruction strategy that converts the analyzed audio tracks into spectrogram images and compare them using alignment strategies borrowed from computer vision. With respect to strategies currently-available in literature, th...