Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows compar...
International audienceThe current trend in object detection and localization is to learn predictions...
The area of scene text recognition focuses on the problem of recognizing arbitrary text in images of...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...
International audienceLike for many text understanding and generation tasks, pre-trained languages m...
Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs...
International audienceTransformer-based pre-training techniques of text and layout have proven effec...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Extracting information from documents usually relies on natural language processing methods working ...
We present a framework to analyze color documents of complex layout. In addition, no assumption is m...
[[abstract]]The purpose of document layout analysis is to locate textlines and text regions in docum...
Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages ...
Transformer-based Language Models are widely used in Natural Language Processing related tasks. Than...
The page, whether on screen or on paper, constitutes a well circumscribed space constructed and regu...
The current spread of digital documents raised the need of effective content-based retrieval techni...
We present a framework to analyze color documents of complex layout. In addition, no assumption is m...
International audienceThe current trend in object detection and localization is to learn predictions...
The area of scene text recognition focuses on the problem of recognizing arbitrary text in images of...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...
International audienceLike for many text understanding and generation tasks, pre-trained languages m...
Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs...
International audienceTransformer-based pre-training techniques of text and layout have proven effec...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Extracting information from documents usually relies on natural language processing methods working ...
We present a framework to analyze color documents of complex layout. In addition, no assumption is m...
[[abstract]]The purpose of document layout analysis is to locate textlines and text regions in docum...
Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages ...
Transformer-based Language Models are widely used in Natural Language Processing related tasks. Than...
The page, whether on screen or on paper, constitutes a well circumscribed space constructed and regu...
The current spread of digital documents raised the need of effective content-based retrieval techni...
We present a framework to analyze color documents of complex layout. In addition, no assumption is m...
International audienceThe current trend in object detection and localization is to learn predictions...
The area of scene text recognition focuses on the problem of recognizing arbitrary text in images of...
We present a general approach for the hierarchical segmentation and labeling of document layout stru...