Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervise...
Despite their recent popularity and well-known advantages, dense retrievers still lag behind sparse ...
Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transfor...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense pas...
Recently, retrieval models based on dense representations are dominant in passage retrieval tasks, d...
The advent of contextualised language models has brought gains in search effectiveness, not just whe...
Dense retrievers encode texts and map them in an embedding space using pre-trained language models. ...
Retriever-reader models achieve competitive performance across many different NLP tasks such as open...
Many recent approaches of passage retrieval are using dense embeddings generated from deep neural mo...
The text retrieval task is mainly performed in two ways: the bi-encoder approach and the generative ...
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive per...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...
Recent research demonstrates the effectiveness of using pretrained language models (PLM) to improve ...
In the field of information retrieval, Passage related to Query are usually easy to get, and the pas...
Despite their recent popularity and well-known advantages, dense retrievers still lag behind sparse ...
Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transfor...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense pas...
Recently, retrieval models based on dense representations are dominant in passage retrieval tasks, d...
The advent of contextualised language models has brought gains in search effectiveness, not just whe...
Dense retrievers encode texts and map them in an embedding space using pre-trained language models. ...
Retriever-reader models achieve competitive performance across many different NLP tasks such as open...
Many recent approaches of passage retrieval are using dense embeddings generated from deep neural mo...
The text retrieval task is mainly performed in two ways: the bi-encoder approach and the generative ...
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive per...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...
Recent research demonstrates the effectiveness of using pretrained language models (PLM) to improve ...
In the field of information retrieval, Passage related to Query are usually easy to get, and the pas...
Despite their recent popularity and well-known advantages, dense retrievers still lag behind sparse ...
Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transfor...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...