The perception of emotions is often suggested to be multimodal in nature, and bimodal as compared to unimodal (auditory or visual) presentation of emotional stimuli can lead to superior emotion recognition. In previous studies, contrastive aftereffects in emotion perception caused by perceptual adaptation have been shown for faces and for auditory affective vocalization, when adaptors were of the same modality. By contrast, crossmodal aftereffects in the perception of emotional vocalizations have not been demonstrated yet. In three experiments we investigated the influence of emotional voice as well as dynamic facial video adaptors on the perception of emotion-ambiguous voices morphed on an angry-to-happy continuum. Contrastive aftereffects...