The datasets were produced in my thesis project. The thesis (in Czech language) explores the application of approximate string matching in scientific publication record linkage process. An introduction to record matching along with five commonly used metrics for string distance (Levenshtein, Jaro, Jaro-Winkler, Cosine distances and Jaccard coefficient) are provided. These metrics are applied on publication metadata from V3S current research information system of the Czech Technical University in Prague. Based on the findings, optimal thresholds in the F1, F2 and F3-measures are determined for each metric. Thesis citation: DOBIÁŠOVSKÝ, Jan. Approximate equality of character strings and its application to record linkage in metadata of scient...
The main aim of this thesis is to study genealogical data, to find out possible problems in their me...
Many database applications require similarity based retrieval on stored text and/or multimedia objec...
Information Extraction is concerned with discovering entities, relationships and events from text. B...
For research evaluation, publication lists need to be matched to entries in large bibliographic data...
Mathiak, Brigitte, Boland, Katarina. Challenges in Matching Dataset Citation Strings to Datasets in ...
This paper presents a text matching process for identification and correct assignment of scholarly p...
The paper proposes matching short forms (abbreviated titles from the citation report) with their cor...
String similarity measures play anincreasingly important role in text related researchand applicatio...
We compare variations of string comparators based on the Jaro-Winkler comparator and edit distance c...
Linked data has been widely recognized as an important paradigm for representing data and one of the...
Frequent successful publications by specific institutions are indicators for identifying outstanding...
This paper presents results of the numerous experiments on usability of well-established string dist...
We describe an open-source Java toolkit of methods for matching names and records. We summarize res...
Artículo de publicación ISI.We survey the current techniques to cope with the problem of string matc...
This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a simi...
The main aim of this thesis is to study genealogical data, to find out possible problems in their me...
Many database applications require similarity based retrieval on stored text and/or multimedia objec...
Information Extraction is concerned with discovering entities, relationships and events from text. B...
For research evaluation, publication lists need to be matched to entries in large bibliographic data...
Mathiak, Brigitte, Boland, Katarina. Challenges in Matching Dataset Citation Strings to Datasets in ...
This paper presents a text matching process for identification and correct assignment of scholarly p...
The paper proposes matching short forms (abbreviated titles from the citation report) with their cor...
String similarity measures play anincreasingly important role in text related researchand applicatio...
We compare variations of string comparators based on the Jaro-Winkler comparator and edit distance c...
Linked data has been widely recognized as an important paradigm for representing data and one of the...
Frequent successful publications by specific institutions are indicators for identifying outstanding...
This paper presents results of the numerous experiments on usability of well-established string dist...
We describe an open-source Java toolkit of methods for matching names and records. We summarize res...
Artículo de publicación ISI.We survey the current techniques to cope with the problem of string matc...
This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a simi...
The main aim of this thesis is to study genealogical data, to find out possible problems in their me...
Many database applications require similarity based retrieval on stored text and/or multimedia objec...
Information Extraction is concerned with discovering entities, relationships and events from text. B...