The problem of learning language models from large text corpora has been widely stud-ied within the computational linguistic com-munity. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distri-butional memory and a commonsense knowl-edge database, ConceptNet, in two visual recognition scenarios: human action recog-nition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We de-termine the usefulness of different language models in aiding the two visual recognition tasks. The study shows th...
Feature selection is very important for many computer vision applications. However, it is hard to fi...
People typically learn through exposure to visual facts associated with linguistic descriptions. For...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
The problem of learning language models from large text corpora has been widely stud-ied within the ...
During the last decade, machine learning techniques have been used successfully in many applications...
Thesis (Ph.D.)--University of Washington, 2017-09A goal of artificial intelligence is to create a sy...
Large language models are known to suffer from the hallucination problem in that they are prone to o...
Large language models are known to suffer from the hallucination problem in that they are prone to o...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
International audienceIn recent years, joint text-image embeddings have significantly improved thank...
Current language models have been criticised for learning language from text alone without connectio...
Generating sentences from images has historically been performed with standalone Computer Vision sys...
While text generated by current vision-language models may be accurate and syntactically correct, it...
Feature selection is very important for many computer vision applications. However, it is hard to fi...
People typically learn through exposure to visual facts associated with linguistic descriptions. For...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
The problem of learning language models from large text corpora has been widely stud-ied within the ...
During the last decade, machine learning techniques have been used successfully in many applications...
Thesis (Ph.D.)--University of Washington, 2017-09A goal of artificial intelligence is to create a sy...
Large language models are known to suffer from the hallucination problem in that they are prone to o...
Large language models are known to suffer from the hallucination problem in that they are prone to o...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
International audienceIn recent years, joint text-image embeddings have significantly improved thank...
Current language models have been criticised for learning language from text alone without connectio...
Generating sentences from images has historically been performed with standalone Computer Vision sys...
While text generated by current vision-language models may be accurate and syntactically correct, it...
Feature selection is very important for many computer vision applications. However, it is hard to fi...
People typically learn through exposure to visual facts associated with linguistic descriptions. For...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....