International audienceTransformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training tasks proposed in the literature for business documents are too generic and not sufficient to learn more complex structures. In this paper, we use LayoutLM, a language model pre-trained on a collection of business documents, and introduce two new pre-training tasks that further improve its capacity to extract relevant information. The first is aimed at better understanding the complex layout of documents, and the second focuses on numeric values and their order of magnitude. These task...
Natural Language Processing has reached a high importance in research and business applications. The...
The ability to automatically read, recognize, and extract different information from unstructured te...
The major goal of the READ project was electronic processing of paperbound information in relevant a...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Transformer-based Language Models are widely used in Natural Language Processing related tasks. Than...
International audienceLike for many text understanding and generation tasks, pre-trained languages m...
Extracting information from documents usually relies on natural language processing methods working ...
International audienceThe predominant approaches for extracting key information from documents resor...
The present paper is focused on information extraction from key fields of invoices using two differe...
The field of service automation is progressing rapidly, and increasingly complex tasks are being aut...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
Due to the massive and increasing amount of documents received each day and the number of steps to p...
The day-to-day working of an organization produces a massive volume of unstructured data in the form...
This chapter presents a model for knowledge extraction from documents written in natural language. T...
A great deal of work has been done in the past on natural language recognition within the field of a...
Natural Language Processing has reached a high importance in research and business applications. The...
The ability to automatically read, recognize, and extract different information from unstructured te...
The major goal of the READ project was electronic processing of paperbound information in relevant a...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Transformer-based Language Models are widely used in Natural Language Processing related tasks. Than...
International audienceLike for many text understanding and generation tasks, pre-trained languages m...
Extracting information from documents usually relies on natural language processing methods working ...
International audienceThe predominant approaches for extracting key information from documents resor...
The present paper is focused on information extraction from key fields of invoices using two differe...
The field of service automation is progressing rapidly, and increasingly complex tasks are being aut...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
Due to the massive and increasing amount of documents received each day and the number of steps to p...
The day-to-day working of an organization produces a massive volume of unstructured data in the form...
This chapter presents a model for knowledge extraction from documents written in natural language. T...
A great deal of work has been done in the past on natural language recognition within the field of a...
Natural Language Processing has reached a high importance in research and business applications. The...
The ability to automatically read, recognize, and extract different information from unstructured te...
The major goal of the READ project was electronic processing of paperbound information in relevant a...