Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training tasks proposed in the literature for business documents are too generic and not sufficient to learn more complex structures. In this paper, we use LayoutLM, a language model pre-trained on a collection of business documents, and introduce two new pre-training tasks that further improve its capacity to extract relevant information. The first is aimed at better understanding the complex layout of documents, and the second focuses on numeric values and their order of magnitude. These tasks force the model to l...
The field of service automation is progressing rapidly, and increasingly complex tasks are being aut...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
A definition for a document type within an organization represents an organizational norm about the ...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
International audienceLike for many text understanding and generation tasks, pre-trained languages m...
Extracting information from documents usually relies on natural language processing methods working ...
Due to the massive and increasing amount of documents received each day and the number of steps to p...
Building document-grounded dialogue systems have received growing interest as documents convey a wea...
The present paper is focused on information extraction from key fields of invoices using two differe...
The emergence of Large Language Models (LLMs) has boosted performance and possibilities in various N...
Deep Learning (DL) is dominating the fields of Natural Language Processing (NLP) and Computer Vision...
International audienceThe predominant approaches for extracting key information from documents resor...
Understanding visually-rich business documents to extract structured data and automate business work...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Understanding documents with rich layouts is an essential step towards information extraction. Busin...
The field of service automation is progressing rapidly, and increasingly complex tasks are being aut...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
A definition for a document type within an organization represents an organizational norm about the ...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
International audienceLike for many text understanding and generation tasks, pre-trained languages m...
Extracting information from documents usually relies on natural language processing methods working ...
Due to the massive and increasing amount of documents received each day and the number of steps to p...
Building document-grounded dialogue systems have received growing interest as documents convey a wea...
The present paper is focused on information extraction from key fields of invoices using two differe...
The emergence of Large Language Models (LLMs) has boosted performance and possibilities in various N...
Deep Learning (DL) is dominating the fields of Natural Language Processing (NLP) and Computer Vision...
International audienceThe predominant approaches for extracting key information from documents resor...
Understanding visually-rich business documents to extract structured data and automate business work...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Understanding documents with rich layouts is an essential step towards information extraction. Busin...
The field of service automation is progressing rapidly, and increasingly complex tasks are being aut...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
A definition for a document type within an organization represents an organizational norm about the ...