International audienceIn this paper, we present the method we have designed and implemented for identifying lists and sentences in PDF documents while participating to FinSBD-2 Financial Document Analysis Shared Task. We propose a model-driven approach for the French and English datasets. It relies on a top-down process from the PDF itself in order to keep control of the workflow. Our objective is to use PDF structure extraction to improve text segment boundaries detection in an end-to-end fashion
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
This paper addresses the problem of extracting and segmenting references from PDF documents. The nov...
International audienceIn this paper, we present the method we have designed and implemented for id...
Portable Document Format (PDF) has become the industry-standard document as it is independent of the...
Given the growth of scientific literature on the web, particularly material science, acquiring data ...
International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Fin...
International audienceIn this paper, we present our contribution to the FinTOC-2021 Shared Task "Fin...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial document...
Documents in PDF format are nowadays called the Universal document format. PDF to speech converter s...
Text preprocessing and segmentation are critical tasks in search and text mining applications. Due t...
Given the growth of scientific literature on the web, particularly material science, acquiring data ...
As the rapid growth of the scientific documents in digital libraries, the search demands for the doc...
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, it...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
This paper addresses the problem of extracting and segmenting references from PDF documents. The nov...
International audienceIn this paper, we present the method we have designed and implemented for id...
Portable Document Format (PDF) has become the industry-standard document as it is independent of the...
Given the growth of scientific literature on the web, particularly material science, acquiring data ...
International audienceIn this paper, we present our contribution to the FinTOC-2022 Shared Task "Fin...
International audienceIn this paper, we present our contribution to the FinTOC-2021 Shared Task "Fin...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial document...
Documents in PDF format are nowadays called the Universal document format. PDF to speech converter s...
Text preprocessing and segmentation are critical tasks in search and text mining applications. Due t...
Given the growth of scientific literature on the web, particularly material science, acquiring data ...
As the rapid growth of the scientific documents in digital libraries, the search demands for the doc...
This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, it...
Nowadays PDF documents have become a dominating knowledge repository for both the academia and indus...
This paper presents the FinTOC-2021 Shared Task on structure extraction from financial documents, it...
This paper addresses the problem of extracting and segmenting references from PDF documents. The nov...