Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. In this work we ask whether this dependence on labeled data can be reduced via unsupervised pretraining that is geared towards ODQA. We show this is in fact possible, via a novel pretraining scheme designed for retrieval. Our "recurring span retrieval" approach uses recurring spans across passages in a document to create pseudo examples for contrastive learning. Our pretraining scheme directly controls for term overlap across pseudo queries and relevant passages, thus allowing to model both lexical and semantic relations between them. The resulting model, named Spider, performs...
Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries dif...
Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large am...
Dense retrieval uses a contrastive learning framework to learn dense representations of queries and ...
Retriever-reader models achieve competitive performance across many different NLP tasks such as open...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...
We propose a simple and effective re-ranking method for improving passage retrieval in open question...
Recent approaches to Open-domain Question Answering refer to an external knowledge base using a retr...
Fine-tuned language models use greedy decoding to answer reading comprehension questions with relati...
Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datas...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
We present a Question Answering (QA) system which learns how to detect and rank answer passages by a...
Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) ...
The retriever-reader pipeline has shown promising performance in open-domain QA but suffers from a v...
Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks...
Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries dif...
Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large am...
Dense retrieval uses a contrastive learning framework to learn dense representations of queries and ...
Retriever-reader models achieve competitive performance across many different NLP tasks such as open...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...
We propose a simple and effective re-ranking method for improving passage retrieval in open question...
Recent approaches to Open-domain Question Answering refer to an external knowledge base using a retr...
Fine-tuned language models use greedy decoding to answer reading comprehension questions with relati...
Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datas...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
We present a Question Answering (QA) system which learns how to detect and rank answer passages by a...
Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) ...
The retriever-reader pipeline has shown promising performance in open-domain QA but suffers from a v...
Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks...
Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries dif...
Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large am...
Dense retrieval uses a contrastive learning framework to learn dense representations of queries and ...