We propose a novel non-parametric method for cross-modal recipe retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing methods on a challenging image-to-recipe task. We also use our method for comparing image and text encoders trained using different modern approaches, thus addressing the issues hindering the development of novel methods for cross-modal recipe retrieval. We demonstrate how to use the insights from model comparison and extend our baseline model with standard triplet loss that improves state-...
The domain of analysis and synthesis of food images is gaining increasing research interest due to i...
National Research Foundation (NRF) Singapore under International Research Centre in Singapore Fundin...
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. ...
In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for La...
In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cook...
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipe...
Food is significant to human daily life. In this paper, we are interested in learning structural rep...
Tracking food intake is a key point for diet management. To simplify the recording process, research...
International audienceRecent advances in the machine learning community allowed different use cases ...
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Fundi...
International audienceThis paper deals with automatic systems for image recipe recognition. For this...
Learning effective recipe representations is essential in food studies. Unlike what has been develop...
Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly,...
We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a s...
This paper deals with automatic systems for image recipe recognition. For this purpose, we compare a...
The domain of analysis and synthesis of food images is gaining increasing research interest due to i...
National Research Foundation (NRF) Singapore under International Research Centre in Singapore Fundin...
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. ...
In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for La...
In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cook...
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipe...
Food is significant to human daily life. In this paper, we are interested in learning structural rep...
Tracking food intake is a key point for diet management. To simplify the recording process, research...
International audienceRecent advances in the machine learning community allowed different use cases ...
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Fundi...
International audienceThis paper deals with automatic systems for image recipe recognition. For this...
Learning effective recipe representations is essential in food studies. Unlike what has been develop...
Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly,...
We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a s...
This paper deals with automatic systems for image recipe recognition. For this purpose, we compare a...
The domain of analysis and synthesis of food images is gaining increasing research interest due to i...
National Research Foundation (NRF) Singapore under International Research Centre in Singapore Fundin...
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. ...