International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Financial Document Structure Extraction". We participated in the three tracks dedicated to English, French and Spanish document processing. Our main contribution consists in considering financial prospectus as a bundle of documents, i.e., a set of merged documents, each with their own layout and structure. Therefore, Document Layout and Structure Analysis (DLSA) first starts with the boundary detection of each document using general layout features. Then, the process applies inside each single document, taking advantage of the local properties. DLSA is achieved considering simultaneously text content, vectorial shapes and images embedded in the ...
National audienceWhen reading a document, we intuitively have a first global approach in order to de...
In this paper we present the evaluation of our automatic methods for detecting and extracting docume...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Fin...
International audienceIn this paper, we present our contribution to the FinTOC-2021 Shared Task "Fin...
This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial document...
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, it...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
In this paper, we present the method we have designed and implemented for identifying lists and ...
International audienceIn this paper, we present the method we have designed and implemented for iden...
International audienceWe present different methods for the two tasks of the 2019 FinTOC challenge: T...
A document image is composed of a variety of physical entities or regions such as text blocks, lines...
Portable Document Format (PDF) has become the industry-standard document as it is independent of the...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in ...
National audienceWhen reading a document, we intuitively have a first global approach in order to de...
In this paper we present the evaluation of our automatic methods for detecting and extracting docume...
Legal documents often have a complex layout with many different headings, headers and footers, side ...
International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Fin...
International audienceIn this paper, we present our contribution to the FinTOC-2021 Shared Task "Fin...
This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial document...
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, it...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
In this paper, we present the method we have designed and implemented for identifying lists and ...
International audienceIn this paper, we present the method we have designed and implemented for iden...
International audienceWe present different methods for the two tasks of the 2019 FinTOC challenge: T...
A document image is composed of a variety of physical entities or regions such as text blocks, lines...
Portable Document Format (PDF) has become the industry-standard document as it is independent of the...
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original...
In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in ...
National audienceWhen reading a document, we intuitively have a first global approach in order to de...
In this paper we present the evaluation of our automatic methods for detecting and extracting docume...
Legal documents often have a complex layout with many different headings, headers and footers, side ...