In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for Large Batch Training (TNLBT), which is inspired by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer). TNLBT aims to accomplish retrieval tasks while generating images from recipe embeddings. We apply the Hierarchical Transformer-based recipe text encoder, the Vision Transformer~(ViT)-based recipe image encoder, and an adversarial network architecture to enable better cross-modal embedding learning for recipe texts and images. In addition, we use self-supervised learning to exploit the rich information in the recipe texts having no corresponding images. Since contrastive learning could benefit from a larger batch size a...
Transformer-based architectures represent the state of the art in sequence modeling tasks like machi...
Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cro...
The domain of analysis and synthesis of food images is gaining increasing research interest due to i...
We propose a novel non-parametric method for cross-modal recipe retrieval which is applied on top of...
In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cook...
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipe...
Food is significant to human daily life. In this paper, we are interested in learning structural rep...
Learning effective recipe representations is essential in food studies. Unlike what has been develop...
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Fundi...
Tracking food intake is a key point for diet management. To simplify the recording process, research...
International audienceThis paper deals with automatic systems for image recipe recognition. For this...
This paper deals with automatic systems for image recipe recognition. For this purpose, we compare a...
International audienceRecent advances in the machine learning community allowed different use cases ...
Image-text matching is an interesting and fascinating task in modern AI research. Despite the evolut...
Most existing cross-modal retrieval methods employ two-stream encoders with different architectures ...
Transformer-based architectures represent the state of the art in sequence modeling tasks like machi...
Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cro...
The domain of analysis and synthesis of food images is gaining increasing research interest due to i...
We propose a novel non-parametric method for cross-modal recipe retrieval which is applied on top of...
In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cook...
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipe...
Food is significant to human daily life. In this paper, we are interested in learning structural rep...
Learning effective recipe representations is essential in food studies. Unlike what has been develop...
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Fundi...
Tracking food intake is a key point for diet management. To simplify the recording process, research...
International audienceThis paper deals with automatic systems for image recipe recognition. For this...
This paper deals with automatic systems for image recipe recognition. For this purpose, we compare a...
International audienceRecent advances in the machine learning community allowed different use cases ...
Image-text matching is an interesting and fascinating task in modern AI research. Despite the evolut...
Most existing cross-modal retrieval methods employ two-stream encoders with different architectures ...
Transformer-based architectures represent the state of the art in sequence modeling tasks like machi...
Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cro...
The domain of analysis and synthesis of food images is gaining increasing research interest due to i...