Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear e...
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that lear...
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At...
Dense retrieval models have predominantly been studied for English, where models have shown great su...
Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities ...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Information retrieval algorithms leverage various collection statistics to improve performance. Beca...
Recent work has shown that small distilled language models are strong competitors to models that are...
Augmenting language models with a retrieval mechanism has been shown to significantly improve their ...
The recent decade has witnessed an explosive growth of online information with the birth of Web. Sea...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...
Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount o...
While large language models (LLMs) are equipped with longer text input capabilities than before, the...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...
This thesis presents a series of conceptual and empirical developments on the ranking and retrieval ...
A long query provides more useful hints for searching relevant documents, but it is likely to introd...
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that lear...
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At...
Dense retrieval models have predominantly been studied for English, where models have shown great su...
Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities ...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Information retrieval algorithms leverage various collection statistics to improve performance. Beca...
Recent work has shown that small distilled language models are strong competitors to models that are...
Augmenting language models with a retrieval mechanism has been shown to significantly improve their ...
The recent decade has witnessed an explosive growth of online information with the birth of Web. Sea...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...
Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount o...
While large language models (LLMs) are equipped with longer text input capabilities than before, the...
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased...
This thesis presents a series of conceptual and empirical developments on the ranking and retrieval ...
A long query provides more useful hints for searching relevant documents, but it is likely to introd...
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that lear...
Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At...
Dense retrieval models have predominantly been studied for English, where models have shown great su...