From playing basketball to ordering at a food counter, we frequently and effortlessly coordinate our attention with others towards a common focus: we look at the ball, or point at a piece of cake. This non-verbal coordination of attention plays a fundamental role in our social lives: it ensures that we refer to the same object, develop a shared language, understand each other's mental states, and coordinate our actions. Models of joint attention generally attribute this accomplishment to gaze coordination. But are visual attentional mechanisms sufficient to achieve joint attention, in all cases? Besides cases where visual information is missing, we show how combining it with other senses can be helpful, and even necessary to certain uses of...