Although multimodality in tourist communication has been widely investigated, published research so far has focused mainly on the interaction between text and pictures in printed and web-based tourist genres, while little research has been carried out on the aural dimension of audiovisual tourist texts. This study proposes a multimodal analysis of fifty city audio guides in Italian and in English, aimed at exploring how different ‘voices’ may be combined to create meaning and involve the listener. The analysis focuses on the types of speaker involved in narration and the different ways in which they interact. Furthermore, following van Leeuwen’s model (1999), it also shows how different semiotic resources are combined to enhance narration. ...