Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding. Previous approaches generally tackle KIE by sequence tagging, which faces difficulty to process non-flatten sequences, especially for table-text mixed documents. These approaches also suffer from the trouble of pre-defining a fixed set of labels for each type of documents, as well as the label imbalance issue. In this work, we assume Optical Character Recognition (OCR) has been applied to input documents, and reformulate the KIE task as a region prediction problem in the two-dimensional (2D) space given a target field. Follow...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
Precise description of layout entities (content regions on a page) is crucial for all but the most t...
The current spread of digital documents raised the need of effective content-based retrieval techni...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Understanding documents with rich layouts is an essential step towards information extraction. Busin...
[[abstract]]Form document analysis is one of the most essential tasks in document analysis and recog...
In this article, we show how some concepts found in traditional and old layout practices used to lay...
AbstractText/Image region separation is the process of identifying location of various text and imag...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
Unconstrained handwritten document recognition is a challenging computer vision task. It is traditio...
Extracting information from documents usually relies on natural language processing methods working ...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...
Key information extraction (KIE) from document images requires understanding the contextual and spat...
Precise description of layout entities (content regions on a page) is crucial for all but the most t...
The current spread of digital documents raised the need of effective content-based retrieval techni...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Understanding documents with rich layouts is an essential step towards information extraction. Busin...
[[abstract]]Form document analysis is one of the most essential tasks in document analysis and recog...
In this article, we show how some concepts found in traditional and old layout practices used to lay...
AbstractText/Image region separation is the process of identifying location of various text and imag...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely ...
Unconstrained handwritten document recognition is a challenging computer vision task. It is traditio...
Extracting information from documents usually relies on natural language processing methods working ...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly,...
Digitization of newspapers is of interest for many reasons including preservation of history, access...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
International audienceThis article describes the work performed in the Pattern Redundancy Analysis f...