This paper tackles the task of named entity recognition (NER) applied to digitized historical texts obtained from processing digital images of newspapers using optical character recognition (OCR) techniques. We argue that the main challenge for this task is that the OCR process leads to misspellings and linguistic errors in the output text. Moreover, historical variations can be present in aged documents, which can impact the performance of the NER process. We conduct a comparative evaluation on two historical datasets in German and French against previous state-of-the-art models, and we propose a model based on a hierarchical stack of Transformers to approach the NER task for historical data. Our findings show that the proposed model clear...
The accessibility to digitized documents in digital libraries is greatly affected by the quality of ...
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical d...
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical d...
This paper tackles the task of named entity recognition (NER) applied to digitized historical texts ...
National audienceThis paper tackles the task of NER applied to historical texts obtained from proces...
National audienceThis paper tackles the task of NER applied to historical texts obtained from proces...
International audienceNamed entity recognition (NER) is a necessary step in many pipelines targeting...
In recent years, many cultural institutions have engaged in large-scale newspaper digitization proje...
International audienceNamed entity recognition is of high interest to digital humanities, in particu...
International audienceNamed entity recognition is of high interest to digital humanities, in particu...
Thesis (Master's)--University of Washington, 2019The field of digital humanities has spurred an incr...
About NER models created for the evaluation of Optical Character Recognition (OCR) and Named Entity ...
Recognition and identification of real-world entities is at the core of virtually any text mining ap...
Named entity recognition (NER), search, classification and tagging of names and name like frequent i...
Named entities (NEs) are among the most relevant type of information that can be used to efficiently...
The accessibility to digitized documents in digital libraries is greatly affected by the quality of ...
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical d...
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical d...
This paper tackles the task of named entity recognition (NER) applied to digitized historical texts ...
National audienceThis paper tackles the task of NER applied to historical texts obtained from proces...
National audienceThis paper tackles the task of NER applied to historical texts obtained from proces...
International audienceNamed entity recognition (NER) is a necessary step in many pipelines targeting...
In recent years, many cultural institutions have engaged in large-scale newspaper digitization proje...
International audienceNamed entity recognition is of high interest to digital humanities, in particu...
International audienceNamed entity recognition is of high interest to digital humanities, in particu...
Thesis (Master's)--University of Washington, 2019The field of digital humanities has spurred an incr...
About NER models created for the evaluation of Optical Character Recognition (OCR) and Named Entity ...
Recognition and identification of real-world entities is at the core of virtually any text mining ap...
Named entity recognition (NER), search, classification and tagging of names and name like frequent i...
Named entities (NEs) are among the most relevant type of information that can be used to efficiently...
The accessibility to digitized documents in digital libraries is greatly affected by the quality of ...
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical d...
We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical d...