Seeing allows animals and people alike to gather information from a distance, often with high spatial and temporal resolution. Machines have access to this rich pool of information thanks to their cameras. But, they still do not have the software to process it, in order to transform the raw pixel values into useful information such as nature, position, and function of the surrounding objects. That is one of the reasons why it is still difficult for them to naviguate in an unknown environment and interract with people and objects in an un-planned fashion. However, the design of such a software implies many challenges. Among them, it is hard to compare two images, for insance, in order to recognize that the seen image is similar to another wh...