International audienceOn a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, in which case one loses potential relevant information, segmenting them into several passages, which may lead to miss some information and high computational complexity when the number of passages is large, or modifying the self-attention mechanism to make it sparser as in sparse-attention models, at the risk again of mis...
Since their release, Transformers have revolutionized many fields from Natural Language Understandin...
The task in text retrieval is to find the subset of a collection of documents relevant to a user's ...
Recent research has shown that long documents are unfairly penalised by a number of current retrieva...
On a wide range of natural language processing and information retrieval tasks, transformer-based mo...
International audienceTransformer-based models, and especially pre-trained language models like BERT...
The recent literature in text classification is biased towards short text sequences (e.g., sentences...
Dense retrieval, which describes the use of contextualised language models such as BERT to identify ...
Information extraction systems extract structured data from natural language text, to support richer...
This thesis covers topics relevant to information organization and retrieval. The main objective of ...
Because of the world wide web, information retrieval systems are now used by millions of untrained u...
International audienceDuring this internship, we worked on improving an open domain question answeri...
Retrieval with extremely long queries and documents is a well-known and challenging task in informat...
Transformer-based architectures in natural language processing force input size limits that can be p...
The performance of text classification methods has improved greatly over the last decade for text in...
Building on previous work in the field of language modeling information retrieval (IR), this paper p...
Since their release, Transformers have revolutionized many fields from Natural Language Understandin...
The task in text retrieval is to find the subset of a collection of documents relevant to a user's ...
Recent research has shown that long documents are unfairly penalised by a number of current retrieva...
On a wide range of natural language processing and information retrieval tasks, transformer-based mo...
International audienceTransformer-based models, and especially pre-trained language models like BERT...
The recent literature in text classification is biased towards short text sequences (e.g., sentences...
Dense retrieval, which describes the use of contextualised language models such as BERT to identify ...
Information extraction systems extract structured data from natural language text, to support richer...
This thesis covers topics relevant to information organization and retrieval. The main objective of ...
Because of the world wide web, information retrieval systems are now used by millions of untrained u...
International audienceDuring this internship, we worked on improving an open domain question answeri...
Retrieval with extremely long queries and documents is a well-known and challenging task in informat...
Transformer-based architectures in natural language processing force input size limits that can be p...
The performance of text classification methods has improved greatly over the last decade for text in...
Building on previous work in the field of language modeling information retrieval (IR), this paper p...
Since their release, Transformers have revolutionized many fields from Natural Language Understandin...
The task in text retrieval is to find the subset of a collection of documents relevant to a user's ...
Recent research has shown that long documents are unfairly penalised by a number of current retrieva...