The Euronews XML corpus comprises the transcription and XML encoding of handwritten newsletters, ranging between 1550 and 1730 and preserved today within the Florence State Archives. The manuscript newsletters, also called avvisi in Italian, are a Renaissance invention consisting of usually anonymous sheets, reproduced in multiple copies, which eventually became the basis of the first printed journalism. The Euronews project team built a methodology to encode this type of early modern informative source and to create a corpus usable for data analytics. The transcription and XML encoding guidelines are explained in detail at this page: https://github.com/lallori/euronews-xml-corpus/wiki/transcription-xml-encoding-guidelines The main langua...
International audienceThis paper deals with a four month-long experiment led by two Latin scholars f...
This dataset accompanies the publication of "Twenty-two Historical Encyclopedias Encoded in TEI: a N...
This corpus consists of 2110 PDF Files and 2110 XML files with the text extracted from the PDF files...
The recent work of the Text Encoding Initiative (TEI) in developing an XML standard for manuscript d...
The data presented here is a set of 326 XML documents containing encoded transcriptions of the indiv...
Digital humanities offer new possibilities to do historical research on journalism in the Modern age...
International audienceThis article describes the process we used for transcribing and analyzing a co...
How can we use topic modelling to study early modern handwritten news sheets? This is a blog post th...
Corpus ccGigafida consists of paragraph samples from 31,722 documents, each containing information a...
MIA (Medici Interactive Archive - http://mia.medici.org) is a community-sourcing research portal, de...
I work on a project called “Digital edition of historical manuscripts” which aims to diffuse on a pu...
The European Union institutions are increasingly present in the lives of European citizens, particul...
This article is a revised and extended version of [VBG, 07]. We conjecture that the digitalization o...
CAMENA - Latin Texts of Early Modern Europe was a great project at the University of Mannheim. The p...
In conventional approaches to computer analysis of historical sources, one must represent the data i...
International audienceThis paper deals with a four month-long experiment led by two Latin scholars f...
This dataset accompanies the publication of "Twenty-two Historical Encyclopedias Encoded in TEI: a N...
This corpus consists of 2110 PDF Files and 2110 XML files with the text extracted from the PDF files...
The recent work of the Text Encoding Initiative (TEI) in developing an XML standard for manuscript d...
The data presented here is a set of 326 XML documents containing encoded transcriptions of the indiv...
Digital humanities offer new possibilities to do historical research on journalism in the Modern age...
International audienceThis article describes the process we used for transcribing and analyzing a co...
How can we use topic modelling to study early modern handwritten news sheets? This is a blog post th...
Corpus ccGigafida consists of paragraph samples from 31,722 documents, each containing information a...
MIA (Medici Interactive Archive - http://mia.medici.org) is a community-sourcing research portal, de...
I work on a project called “Digital edition of historical manuscripts” which aims to diffuse on a pu...
The European Union institutions are increasingly present in the lives of European citizens, particul...
This article is a revised and extended version of [VBG, 07]. We conjecture that the digitalization o...
CAMENA - Latin Texts of Early Modern Europe was a great project at the University of Mannheim. The p...
In conventional approaches to computer analysis of historical sources, one must represent the data i...
International audienceThis paper deals with a four month-long experiment led by two Latin scholars f...
This dataset accompanies the publication of "Twenty-two Historical Encyclopedias Encoded in TEI: a N...
This corpus consists of 2110 PDF Files and 2110 XML files with the text extracted from the PDF files...