Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors, yet the corresponding efforts to explain the self-supervised approaches lag behind. Motivated by this observation, we introduce a novel visual probing framework for explaining the self-supervised models by leveraging probing tasks employed previously in natural language processing. The probing tasks require knowledge about semantic relationships between image parts. Hence, we propose a systematic approach to obtain analogs of natural language in vision, such as visual words, context, and taxonomy. We show the effectiveness and applicability of those analogs in the context of explaining self-...
The resurgence of unsupervised learning can be attributed to the remarkable progress of self-supervi...
In the last few years we have seen a growing interest in machine learning approaches to computer vis...
Language and vision provide complementary information. Integrating both modalities in a single multi...
Recently introduced self-supervised methods for image representation learning provide on par or supe...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
This thesis investigates the possibility of efficiently adapting self-supervised representation lear...
Self-supervised visual representation learning has recently attracted significant research interest....
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
Thesis (Ph.D.)--University of Washington, 2017-09A goal of artificial intelligence is to create a sy...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
International audienceIn recent years, joint text-image embeddings have significantly improved thank...
The complexity of any information processing task is highly dependent on the space where data is rep...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
The large adoption of the self-attention (i.e. transformer model) and BERT-like training principles ...
The resurgence of unsupervised learning can be attributed to the remarkable progress of self-supervi...
In the last few years we have seen a growing interest in machine learning approaches to computer vis...
Language and vision provide complementary information. Integrating both modalities in a single multi...
Recently introduced self-supervised methods for image representation learning provide on par or supe...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
This thesis investigates the possibility of efficiently adapting self-supervised representation lear...
Self-supervised visual representation learning has recently attracted significant research interest....
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
Thesis (Ph.D.)--University of Washington, 2017-09A goal of artificial intelligence is to create a sy...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
International audienceIn recent years, joint text-image embeddings have significantly improved thank...
The complexity of any information processing task is highly dependent on the space where data is rep...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
The large adoption of the self-attention (i.e. transformer model) and BERT-like training principles ...
The resurgence of unsupervised learning can be attributed to the remarkable progress of self-supervi...
In the last few years we have seen a growing interest in machine learning approaches to computer vis...
Language and vision provide complementary information. Integrating both modalities in a single multi...