Nowadays, machine learning is playing a dominant role in most challenging computer vision problems. This paper advocates an extreme evolution of this interplay, where visual agents continuously process videos and interact with humans, just like children, exploiting life–long learning computational schemes. This opens the challenge of en plein air visual agents, whose behavior is progressively monitored and evaluated by novel mechanisms, where dynamic man-machine interaction plays a fundamental role. Going beyond classic benchmarks, we argue that appropriate crowd-sourcing schemes are suitable for performance evaluation of visual agents operating in this framework. We provide a proof of concept of this novel view, by showing methods and conc...