Spatial Transformer Networks (STNs) have the potential to dramatically improve performance of convolutional neural networks in a range of tasks. By ‘focusing’ on the salient parts of the input using a differentiable affine transform, a network augmented with an STN should have increased performance, efficiency and interpretability. However, in practice, STNs rarely exhibit these desiderata, instead converging to a seemingly meaningless transformation of the input. We demonstrate and characterise this localisation problem as deriving from the spatial invariance of feature detection layers acting on extracted glimpses. Drawing on the neuroanatomy of the human eye we then motivate a solution: foveated convolutions. These parallel convolutions ...
Understanding and predicting the human visual attention mechanism is an active area of research in t...
Human vision possesses a special type of visual processing systems called peripheral vision. Partiti...
International audienceConvolutional networks used for computer vision represent candidate models for...
The classic computational scheme of convolutional layers leverages filter banks that are shared over...
Many animals and humans process the visual field with varying spatial resolution (foveated vision) a...
Many animals and humans process the visual field with a varying spatial resolution (foveated vision)...
Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent transl...
Exploiting data invariances is crucial for efficient learning in both artificial and biological neur...
This paper does not attempt to design a state-of-the-art method for visual recognition but investiga...
International audienceFoveated vision is a trait shared by many animals, including humans, but its c...
Computer vision has made a significant progress in recent years thanks to advancement in neural netw...
Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to ...
While abundant in biology, foveated vision is nearly absent from computational models and especially...
Modern machine learning models for computer vision exceed humans in accuracy on specific visual reco...
The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is ...
Understanding and predicting the human visual attention mechanism is an active area of research in t...
Human vision possesses a special type of visual processing systems called peripheral vision. Partiti...
International audienceConvolutional networks used for computer vision represent candidate models for...
The classic computational scheme of convolutional layers leverages filter banks that are shared over...
Many animals and humans process the visual field with varying spatial resolution (foveated vision) a...
Many animals and humans process the visual field with a varying spatial resolution (foveated vision)...
Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent transl...
Exploiting data invariances is crucial for efficient learning in both artificial and biological neur...
This paper does not attempt to design a state-of-the-art method for visual recognition but investiga...
International audienceFoveated vision is a trait shared by many animals, including humans, but its c...
Computer vision has made a significant progress in recent years thanks to advancement in neural netw...
Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to ...
While abundant in biology, foveated vision is nearly absent from computational models and especially...
Modern machine learning models for computer vision exceed humans in accuracy on specific visual reco...
The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is ...
Understanding and predicting the human visual attention mechanism is an active area of research in t...
Human vision possesses a special type of visual processing systems called peripheral vision. Partiti...
International audienceConvolutional networks used for computer vision represent candidate models for...