A significant challenge for NLG is integrating generation technology with real world applications, especially multimedia and multimodal applications which have much more stringent synchronization constraints. We have implemented a system which creates narrative text that accompanies artificial documentaries (either as subtitles or to be synthesized for voiceovers), involving coordination problems for ordering and structuring the narratives in synchrony with an image animation component. In this paper we present a flexible, multilingual generation architecture for producing documentary narratives that allows us to tackle the problems related to the selection and organization of content to be coupled with visual feedback, the selection of r...
This work explores to what extend it is possible to automatically generate films from textual descri...
Generating natural language descriptions for visual data links computer vision and computational lin...
© 2021 IEEEPrevious models for vision-to-language generation tasks usually pretrain a visual encoder...
AbstractAutomatically constructing a complete documentary or educational film from scattered pieces ...
Automatically constructing a complete documentary or educational film from scattered pieces of image...
Recent interests in the use of multimedia presentations and multimodal interfaces have raised the ne...
The ongoing TIWO project is investigating the synthesis of language technologies, like information e...
Natural Language Generation (NLG) systems in English have been well established for the last two dec...
The problem of describing images through natural lan-guage has gained importance in the computer vis...
This electronic version was submitted by the student author. The certified thesis is available in th...
This paper documents the results of a research project that deals with the application of an artific...
This paper presents techniques for multimedia annotation and their application to video sum-marizati...
In this paper we address two important issues in generating spoken language within a multimedia syst...
In this paper we addresses two important issues in generating spoken language within a multimedia sy...
This work explores to what extend it is possible to automatically generate films from textual descri...
This work explores to what extend it is possible to automatically generate films from textual descri...
Generating natural language descriptions for visual data links computer vision and computational lin...
© 2021 IEEEPrevious models for vision-to-language generation tasks usually pretrain a visual encoder...
AbstractAutomatically constructing a complete documentary or educational film from scattered pieces ...
Automatically constructing a complete documentary or educational film from scattered pieces of image...
Recent interests in the use of multimedia presentations and multimodal interfaces have raised the ne...
The ongoing TIWO project is investigating the synthesis of language technologies, like information e...
Natural Language Generation (NLG) systems in English have been well established for the last two dec...
The problem of describing images through natural lan-guage has gained importance in the computer vis...
This electronic version was submitted by the student author. The certified thesis is available in th...
This paper documents the results of a research project that deals with the application of an artific...
This paper presents techniques for multimedia annotation and their application to video sum-marizati...
In this paper we address two important issues in generating spoken language within a multimedia syst...
In this paper we addresses two important issues in generating spoken language within a multimedia sy...
This work explores to what extend it is possible to automatically generate films from textual descri...
This work explores to what extend it is possible to automatically generate films from textual descri...
Generating natural language descriptions for visual data links computer vision and computational lin...
© 2021 IEEEPrevious models for vision-to-language generation tasks usually pretrain a visual encoder...