Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero...
Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero...
While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capab...
Dense retrieval models have predominantly been studied for English, where models have shown great su...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Recent work has shown that small distilled language models are strong competitors to models that are...
Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarit...
We propose a simple and effective re-ranking method for improving passage retrieval in open question...
Retrieval with extremely long queries and documents is a well-known and challenging task in informat...
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive per...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
Recent approaches to Open-domain Question Answering refer to an external knowledge base using a retr...
Although neural information retrieval has witnessed great improvements, recent works showed that the...
Neural ranking methods based on large transformer models have recently gained significant attention ...
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At...
Query rewriting plays a vital role in enhancing conversational search by transforming context-depend...
Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero...
While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capab...
Dense retrieval models have predominantly been studied for English, where models have shown great su...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Recent work has shown that small distilled language models are strong competitors to models that are...
Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarit...
We propose a simple and effective re-ranking method for improving passage retrieval in open question...
Retrieval with extremely long queries and documents is a well-known and challenging task in informat...
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive per...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
Recent approaches to Open-domain Question Answering refer to an external knowledge base using a retr...
Although neural information retrieval has witnessed great improvements, recent works showed that the...
Neural ranking methods based on large transformer models have recently gained significant attention ...
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At...
Query rewriting plays a vital role in enhancing conversational search by transforming context-depend...
Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero...
While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capab...
Dense retrieval models have predominantly been studied for English, where models have shown great su...