In this paper we describe a new dataset, under construc-tion, acquired inside the National Museum of Bargello in Florence. It was recorded with three IP cameras at a res-olution of 1280 × 800 pixels and an average framerate of five frames per second. Sequences were recorded following two scenarios. The first scenario consists of visitors watch-ing different artworks (individuals), while the second one consists of groups of visitors watching the same artworks (groups). This dataset is specifically designed to support re-search on group detection, occlusion handling, tracking, re-identification and behavior analysis. In order to ease the annotation process we designed a user friendly web in-terface that allows to annotate: bounding boxes, occ...