In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations. We also present integrated sequence-to-sequence baselines for machine translation, automatic speech recognition, spoken language translation, and multimodal summarization. By making available data and code for several multimodal natural language tasks, we hope to stimulate more research on these and similar challenges, to obtain a deeper understanding of multimodality in language processing
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Current multimodal data processing methods use deep learning to combine complementary visual and tex...
How information is created, shared and consumed has changed rapidly in recent decades, in part thank...
International audienceHuman information processing is inherently multimodal, and language is best un...
Multimodal machine translation involves drawing information from more than one modality, based on th...
Multimodal machine translation involves drawing information from more than one modality, based on th...
Human speech processing is often a multimodal process combining audio and visual processing. Eyes a...
Since most worldly phenomena can be expressed via language, language is a crucial medium for transfe...
[EN] Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, b...
This paper introduces BD2BB, a novel language and vision benchmark that requires multimodal models c...
This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LI...
One of the factors that have hindered progress in the areas of sign language recognition, translatio...
Research on video- and audio-recordings of spontaneous naturally-occurring conversation in English h...
Artificial Intelligence (AI) has transformed the way we interact with technology e.g., chatbots, voi...
All language use is a multimodal endeavor. Language processing must deal with both auditory and visu...
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Current multimodal data processing methods use deep learning to combine complementary visual and tex...
How information is created, shared and consumed has changed rapidly in recent decades, in part thank...
International audienceHuman information processing is inherently multimodal, and language is best un...
Multimodal machine translation involves drawing information from more than one modality, based on th...
Multimodal machine translation involves drawing information from more than one modality, based on th...
Human speech processing is often a multimodal process combining audio and visual processing. Eyes a...
Since most worldly phenomena can be expressed via language, language is a crucial medium for transfe...
[EN] Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, b...
This paper introduces BD2BB, a novel language and vision benchmark that requires multimodal models c...
This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LI...
One of the factors that have hindered progress in the areas of sign language recognition, translatio...
Research on video- and audio-recordings of spontaneous naturally-occurring conversation in English h...
Artificial Intelligence (AI) has transformed the way we interact with technology e.g., chatbots, voi...
All language use is a multimodal endeavor. Language processing must deal with both auditory and visu...
Ces dernières années, les méthodes d'apprentissage profond ont permis de créer des modèles neuronaux...
Current multimodal data processing methods use deep learning to combine complementary visual and tex...
How information is created, shared and consumed has changed rapidly in recent decades, in part thank...