Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-speaking audiences. We address this limitation by constructing MultiVENT, a dataset of multilingual, event-centric videos grounded in text documents across five target languages. MultiVENT includes both news broadcast videos and non-professional event footage, which we use to analyze the state of online news videos and how they can be leveraged to b...
Nowadays, many influential security-related facts are reported multiple times by different sources a...
We propose a new approach to recognize objects and scenes in news videos motivated by the availabili...
This paper integrates techniques in natural language processing and computer vision to improve recog...
© 2019, Springer Nature Switzerland AG. This paper describes the combination of advanced technologie...
Multilingual text-video retrieval methods have improved significantly in recent years, but the perfo...
International audienceHuman information processing is inherently multimodal, and language is best un...
Every hour, huge amounts of visual contents are posted on social media and user-generated content pl...
In this paper, we introduce How2, a multimodal collection of instructional videos with English subti...
We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cro...
VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access...
This paper describes the combination of advanced technologies for social-media-based story detection...
Qlusty generates videos describing the coverage of the same event by different news outlets automati...
Abstract In this paper we describe a multi-strategy approach to improving semantic extraction from n...
With the huge amount of data that is collected every day and shared on the internet, many recent stu...
The VideoCLEF track, introduced in 2008, aims to develop and evaluate tasks related to analysis of a...
Nowadays, many influential security-related facts are reported multiple times by different sources a...
We propose a new approach to recognize objects and scenes in news videos motivated by the availabili...
This paper integrates techniques in natural language processing and computer vision to improve recog...
© 2019, Springer Nature Switzerland AG. This paper describes the combination of advanced technologie...
Multilingual text-video retrieval methods have improved significantly in recent years, but the perfo...
International audienceHuman information processing is inherently multimodal, and language is best un...
Every hour, huge amounts of visual contents are posted on social media and user-generated content pl...
In this paper, we introduce How2, a multimodal collection of instructional videos with English subti...
We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cro...
VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access...
This paper describes the combination of advanced technologies for social-media-based story detection...
Qlusty generates videos describing the coverage of the same event by different news outlets automati...
Abstract In this paper we describe a multi-strategy approach to improving semantic extraction from n...
With the huge amount of data that is collected every day and shared on the internet, many recent stu...
The VideoCLEF track, introduced in 2008, aims to develop and evaluate tasks related to analysis of a...
Nowadays, many influential security-related facts are reported multiple times by different sources a...
We propose a new approach to recognize objects and scenes in news videos motivated by the availabili...
This paper integrates techniques in natural language processing and computer vision to improve recog...