Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters. Current works still struggle with output images' quality and consistency, and rely on additional semantic information or auxiliary captioning networks. To address these challenges, we first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem. Then, we propose a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency. Extensive experiments on different datasets and human evaluation demonstrate the superior performan...
Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with ...
Automatic generation of textual stories from visual data representation, known as visual storytellin...
Humans have the innate ability to perceive an image just by looking at it, for us images are not jus...
There has been a recent explosion of impressive generative models that can produce high quality imag...
Story visualization aims to generate a series of images, semantically matching a given sequence of s...
We address the problem of visual storytelling, i.e., generating a story for a given sequence of imag...
The automatic generation of realistic images directly from a story text is a very challenging proble...
Automatic generation of natural language description for individual images (a.k.a. image captioning)...
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Re...
Impressive image captioning results (i.e., an objective description for an image) are achieved with ...
Automatic image captioning, a highly challenging research problem, aims to understand and describe t...
We summarize recent developments in our platform for symbolically representing and reasoning over hu...
Computerized systems capable of generating high-level story descriptions have many potential real-li...
With the widespread availability of image captioning at a sentence level, how to automatically gener...
Humans spend a large amount of time listening, watching, and reading stories. We argue that the abil...
Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with ...
Automatic generation of textual stories from visual data representation, known as visual storytellin...
Humans have the innate ability to perceive an image just by looking at it, for us images are not jus...
There has been a recent explosion of impressive generative models that can produce high quality imag...
Story visualization aims to generate a series of images, semantically matching a given sequence of s...
We address the problem of visual storytelling, i.e., generating a story for a given sequence of imag...
The automatic generation of realistic images directly from a story text is a very challenging proble...
Automatic generation of natural language description for individual images (a.k.a. image captioning)...
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Re...
Impressive image captioning results (i.e., an objective description for an image) are achieved with ...
Automatic image captioning, a highly challenging research problem, aims to understand and describe t...
We summarize recent developments in our platform for symbolically representing and reasoning over hu...
Computerized systems capable of generating high-level story descriptions have many potential real-li...
With the widespread availability of image captioning at a sentence level, how to automatically gener...
Humans spend a large amount of time listening, watching, and reading stories. We argue that the abil...
Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with ...
Automatic generation of textual stories from visual data representation, known as visual storytellin...
Humans have the innate ability to perceive an image just by looking at it, for us images are not jus...