This dataset is used in the paper Semi-supervised Multimodal Representation Learning through a Global Workspace, Devillers et al., 2023 (under review). To use this dataset, use the code provided here: https://github.com/bdvllrs/bimGW. It consists of 32x32 pixel images of shapes with multiple attributes (size, location, rotation, color). Each image is also paired with its ground truth information (attributes), and a natural language description (English) of the image. The dataset is composed of: a train set of 500,000 samples, a val and a test set of 1000 samples each. It also contains already processed 12-dimensional visual features (from a VAE), and presaved BERT features of the text descriptions
Comunicació presentada a: the 51st Annual Meeting of the Association for Computational Linguistics, ...
In this study, a global shape descriptor that we call Mixture of Poses (MoP) is proposed to solve hu...
<p>In this thesis, we investigate many aspects to extract shape proxies to enable perceptually sound...
3D shapes come in varied representations from a set of points to a set of images, each capturing dif...
Dataset containing 10000 images of a geometric shape with varying sizes and gray shades and a unifor...
In this paper, we describe a classification framework for binary shapes that have scale, rotation an...
Crowd-sourced data bases such as Flickr for images or Google 3D Warehouse for 3D meshes provide vast...
Computer vision aims to teach machines and algorithms to 'see' with the ultimate goal of creating 'i...
This thesis primarily investigates the potential of the Pairwise Geometric Histogram (PGH) represent...
We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analys...
Low-level cues in an image not only allow to infer higher-level information like the presence of an ...
The visual representation of shape reduces a high-dimensional input into a smaller set of more infor...
International audienceIn recent years, joint text-image embeddings have significantly improved thank...
Unsupervised image-to-image translation techniques are able to map local texture between two domains...
This thesis proposes machine learning algorithms for processing geometry by example. Each algorithm ...
Comunicació presentada a: the 51st Annual Meeting of the Association for Computational Linguistics, ...
In this study, a global shape descriptor that we call Mixture of Poses (MoP) is proposed to solve hu...
<p>In this thesis, we investigate many aspects to extract shape proxies to enable perceptually sound...
3D shapes come in varied representations from a set of points to a set of images, each capturing dif...
Dataset containing 10000 images of a geometric shape with varying sizes and gray shades and a unifor...
In this paper, we describe a classification framework for binary shapes that have scale, rotation an...
Crowd-sourced data bases such as Flickr for images or Google 3D Warehouse for 3D meshes provide vast...
Computer vision aims to teach machines and algorithms to 'see' with the ultimate goal of creating 'i...
This thesis primarily investigates the potential of the Pairwise Geometric Histogram (PGH) represent...
We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analys...
Low-level cues in an image not only allow to infer higher-level information like the presence of an ...
The visual representation of shape reduces a high-dimensional input into a smaller set of more infor...
International audienceIn recent years, joint text-image embeddings have significantly improved thank...
Unsupervised image-to-image translation techniques are able to map local texture between two domains...
This thesis proposes machine learning algorithms for processing geometry by example. Each algorithm ...
Comunicació presentada a: the 51st Annual Meeting of the Association for Computational Linguistics, ...
In this study, a global shape descriptor that we call Mixture of Poses (MoP) is proposed to solve hu...
<p>In this thesis, we investigate many aspects to extract shape proxies to enable perceptually sound...