Recent research demonstrates the effectiveness of using pretrained language models (PLM) to improve dense retrieval and multilingual dense retrieval. In this work, we present a simple but effective monolingual pretraining task called contrastive context prediction~(CCP) to learn sentence representation by modeling sentence level contextual relation. By pushing the embedding of sentences in a local context closer and pushing random negative samples away, different languages could form isomorphic structure, then sentence pairs in two different languages will be automatically aligned. Our experiments show that model collapse and information leakage are very easy to happen during contrastive training of language model, but language-specific mem...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (C...
Thesis (Master's)--University of Washington, 2020This work presents methods for learning cross-lingu...
Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks...
When pre-trained on large unsupervised textual corpora, language models are able to store and retri...
Multilingual sentence embeddings capture rich semantic information not only for measuring similarity...
Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help s...
The scarcity of labeled training data across many languages is a significant roadblock for multiling...
We propose a new model for learning bilingual word representations from non-parallel document-aligne...
We present a neural architecture for cross-lingual mate sentence retrieval which encodes sentences i...
In this paper, we present a thorough investigation on methods that align pre-trained contextualized ...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
In this work we propose a translation model for monolingual sentence retrieval. We propose four meth...
Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER ...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (C...
Thesis (Master's)--University of Washington, 2020This work presents methods for learning cross-lingu...
Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks...
When pre-trained on large unsupervised textual corpora, language models are able to store and retri...
Multilingual sentence embeddings capture rich semantic information not only for measuring similarity...
Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help s...
The scarcity of labeled training data across many languages is a significant roadblock for multiling...
We propose a new model for learning bilingual word representations from non-parallel document-aligne...
We present a neural architecture for cross-lingual mate sentence retrieval which encodes sentences i...
In this paper, we present a thorough investigation on methods that align pre-trained contextualized ...
Recently, methods have been developed to improve the performance of dense passage retrieval by using...
In this work we propose a translation model for monolingual sentence retrieval. We propose four meth...
Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER ...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (C...
Thesis (Master's)--University of Washington, 2020This work presents methods for learning cross-lingu...