Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task. The Masked Language Modeling (MLM) is a major sub-task of the pre-training process. However, we found that the conventional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task (e,g. stop-words and punctuation). By noticing the term importance weight can provide valuable information for passage retrieval, we hereby propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out, to capture this straightforward yet essential information to facilitate the language model pre...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual ...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...
Retriever-reader models achieve competitive performance across many different NLP tasks such as open...
Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based ...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Masked language models conventionally use a masking rate of 15% due to the belief that more masking ...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
Word order, an essential property of natural languages, is injected in Transformer-based neural lang...
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretr...
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive per...
The advent of contextualised language models has brought gains in search effectiveness, not just whe...
Finetuning Pretrained Language Models (PLM) for IR has been de facto the standard practice since the...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual ...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...
Retriever-reader models achieve competitive performance across many different NLP tasks such as open...
Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based ...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Masked language models conventionally use a masking rate of 15% due to the belief that more masking ...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
Word order, an essential property of natural languages, is injected in Transformer-based neural lang...
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretr...
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive per...
The advent of contextualised language models has brought gains in search effectiveness, not just whe...
Finetuning Pretrained Language Models (PLM) for IR has been de facto the standard practice since the...
In this work, we propose a simple method that applies a large language model (LLM) to large-scale re...
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual ...
Text retrieval is a long-standing research topic on information seeking, where a system is required ...