International audienceIn this paper, we present our contribution to the FinTOC-2021 Shared Task "Financial Document Structure Extraction". We participated in the tracks dedicated to English and French document processing. We get results for Title detection and TOC generation performance which demonstrates a good precision. We address the problem in a fairly unusual but ambitious way which consists in considering simultaneously text content, vectorial shapes and images embedded in the native PDF document, and to structure the document in its entirety
Abstract. Accessing the structured content of PDF document is a difficult task, requiring pre-proces...
In the era of digitization, the vast volume of scientific publications has become readily accessible...
Abstract. We present a progress report on our ongoing project of re-verse engineering scientific PDF...
International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Fin...
This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial document...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, it...
International audienceWe present different methods for the two tasks of the 2019 FinTOC challenge: T...
International audienceIn this paper, we present the method we have designed and implemented for iden...
In this paper, we present the method we have designed and implemented for identifying lists and ...
Comunicació presentada a la Language Resources and Evaluation Conference (LREC) 2018, celebrada els ...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlyin...
The PDF format plays a crucial role in the field of electronic academic literature publishing, but d...
Portable Document Format (PDF) has become the industry-standard document as it is independent of the...
Abstract. Accessing the structured content of PDF document is a difficult task, requiring pre-proces...
In the era of digitization, the vast volume of scientific publications has become readily accessible...
Abstract. We present a progress report on our ongoing project of re-verse engineering scientific PDF...
International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Fin...
This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial document...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, it...
International audienceWe present different methods for the two tasks of the 2019 FinTOC challenge: T...
International audienceIn this paper, we present the method we have designed and implemented for iden...
In this paper, we present the method we have designed and implemented for identifying lists and ...
Comunicació presentada a la Language Resources and Evaluation Conference (LREC) 2018, celebrada els ...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlyin...
The PDF format plays a crucial role in the field of electronic academic literature publishing, but d...
Portable Document Format (PDF) has become the industry-standard document as it is independent of the...
Abstract. Accessing the structured content of PDF document is a difficult task, requiring pre-proces...
In the era of digitization, the vast volume of scientific publications has become readily accessible...
Abstract. We present a progress report on our ongoing project of re-verse engineering scientific PDF...