People can describe spatial scenes with language and, vice versa, create images based on linguistic descriptions. However, current systems do not even come close to matching the complexity of humans when it comes to reconstructing a scene from a given text. Even the ever-advancing development of better and better Transformer-based models has not been able to achieve this so far. This task, the automatic generation of a 3D scene based on an input text, is called text-to-3D scene generation. The key challenge, and focus of this dissertation, now relate to the following topics: (a) Analyses of how well current language models understand spatial information, how static embeddings compare, and whether they can be improved by anaphora resolution...
Code and models are publicly available at https://github.com/cshizhe/vil3dref.International audience...
Abstract. Spatial relations play an important role in our understanding of language. In particular, ...
WordsEye is a system for converting from English text into three-dimensional graphical scenes that r...
Text-to-scene conversion requires knowledge about how actions and locations are expressed in languag...
Text-to-scene generation systems take input in the form of a natural language text and output a 3D s...
3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex...
The ability to map descriptions of scenes to 3D geometric representations has many applications in a...
We address the grounding of natural lan-guage to concrete spatial constraints, and inference of impl...
We address the grounding of natural lan-guage to concrete spatial constraints, and inference of impl...
Natural language is an easy and effective medium for describing visual ideas and mental images. Thus...
The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D ...
We outline ongoing work on WordsEye, a text-to-scene generation system. While WordsEye (Coyne and Sp...
Spatial relations play an important role in our understanding of language. In particular, they are a...
Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and com...
We present an interactive text to 3D scene generation system that learns the expected spatial layout...
Code and models are publicly available at https://github.com/cshizhe/vil3dref.International audience...
Abstract. Spatial relations play an important role in our understanding of language. In particular, ...
WordsEye is a system for converting from English text into three-dimensional graphical scenes that r...
Text-to-scene conversion requires knowledge about how actions and locations are expressed in languag...
Text-to-scene generation systems take input in the form of a natural language text and output a 3D s...
3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex...
The ability to map descriptions of scenes to 3D geometric representations has many applications in a...
We address the grounding of natural lan-guage to concrete spatial constraints, and inference of impl...
We address the grounding of natural lan-guage to concrete spatial constraints, and inference of impl...
Natural language is an easy and effective medium for describing visual ideas and mental images. Thus...
The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D ...
We outline ongoing work on WordsEye, a text-to-scene generation system. While WordsEye (Coyne and Sp...
Spatial relations play an important role in our understanding of language. In particular, they are a...
Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and com...
We present an interactive text to 3D scene generation system that learns the expected spatial layout...
Code and models are publicly available at https://github.com/cshizhe/vil3dref.International audience...
Abstract. Spatial relations play an important role in our understanding of language. In particular, ...
WordsEye is a system for converting from English text into three-dimensional graphical scenes that r...