Extracting and parsing reference strings from research articles is a challenging task. State-of-the-art tools like GROBID apply rather simple machine learning models such as conditional random fields (CRF). Recent research has shown a high potential of deep-learning for reference string parsing. The challenge with deep learning is, however, that the training step requires enormous amounts of labeled data – which does not exist for reference string parsing. Creating such a large dataset manually, through human labor, seems hardly feasible. Therefore, we created GIANT. GIANT is a large dataset with 991,411,100 XML labeled reference strings. The strings were automatically created based on 677,000 entries from CrossRef, 1,500 citation styles ...
Information retrieval systems for scholarly literature rely heavily not only on text matching but on...
In this paper, we present new bibliographical reference corpora in digital humanities (DH) that have...
Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and hetero...
Accurately parsing citation strings is key to automatically building large-scale citation graphs, so...
We consider the task of reference mining: the detection, extraction and classification of references...
<p>We consider the task of reference mining: the detection, extraction and classification of referen...
Citations are an important part of scientific papers, and the proper handling of them is indispensab...
Information retrieval systems for scholarly literature rely heavily not only on text matching but on...
Pre-trained word vectors of dimensionality 100 and 300 for the publication: Deep Reference Mining fr...
Predicting the number of citations of scholarly documents is an upcoming task in scholarly document ...
Recent advancements in information retrieval systems significantly rely on the context-based feature...
International audienceCategorization of semantic relationships between scientific papers is a key to...
The combined impact of new computing resources and techniques with an increasing avalanche of large ...
In the process of Systematic Literature Review, citation screening is estimated to be one of the mos...
The impact and significance of a scientific publication is measured mostly by the number of citation...
Information retrieval systems for scholarly literature rely heavily not only on text matching but on...
In this paper, we present new bibliographical reference corpora in digital humanities (DH) that have...
Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and hetero...
Accurately parsing citation strings is key to automatically building large-scale citation graphs, so...
We consider the task of reference mining: the detection, extraction and classification of references...
<p>We consider the task of reference mining: the detection, extraction and classification of referen...
Citations are an important part of scientific papers, and the proper handling of them is indispensab...
Information retrieval systems for scholarly literature rely heavily not only on text matching but on...
Pre-trained word vectors of dimensionality 100 and 300 for the publication: Deep Reference Mining fr...
Predicting the number of citations of scholarly documents is an upcoming task in scholarly document ...
Recent advancements in information retrieval systems significantly rely on the context-based feature...
International audienceCategorization of semantic relationships between scientific papers is a key to...
The combined impact of new computing resources and techniques with an increasing avalanche of large ...
In the process of Systematic Literature Review, citation screening is estimated to be one of the mos...
The impact and significance of a scientific publication is measured mostly by the number of citation...
Information retrieval systems for scholarly literature rely heavily not only on text matching but on...
In this paper, we present new bibliographical reference corpora in digital humanities (DH) that have...
Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and hetero...