The objective of this study is to find the most appropriate parameters and text components for item-wise matching the two large bibliographic datasets: Clarivate Analytics Web of Science (WoS) and Elsevier’s Scopus. Our focus is on detecting exact matches, that is, no false positives are tolerated at all. To this end, we follow a twofold matching procedure. First, a locality sensitive hashing (LSH) algorithm [15] is applied, which provides fast approximate nearest neighbours and similarities, in order to obtain WoS-Scopus pair suggestions. We experiment with three different combinations of text components (i.e., only publication titles, titles + journal names, co-author names + titles + journals) as input for the matching process. In additi...
Approximate Nearest Neighbor (ANN) search in high di-mensional space has become a fundamental paradi...
Abstract—Similarity search is critical for many database ap-plications, including the increasingly p...
Schema matching is a critical problem for integrating heterogeneous information sources. Traditional...
A novel hashing algorithm is applied to match two prominent and important bibliographic databases at...
A novel hashing algorithm is applied to match two prominent and important bibliographic databases at...
Many modern applications of AI such as web search, mobile browsing, image processing, and natural la...
This paper presents a text matching process for identification and correct assignment of scholarly p...
Many modern applications of AI such as web search, mobile browsing, image processing, and natural la...
This paper reports initial research results related to the use of locality-sensitive hashing (LSH) ...
This paper presents a text matching process for identification and correct assignment of scholarly p...
This is an electronic reprint of the original article. This reprint may differ from the original in ...
Metagenomic studies produce large datasets that are estimated to grow at a faster rate than the avai...
This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to...
Research Doctorate - Doctor of Philosophy (PhD)This thesis presents techniques for accelerating simi...
Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradig...
Approximate Nearest Neighbor (ANN) search in high di-mensional space has become a fundamental paradi...
Abstract—Similarity search is critical for many database ap-plications, including the increasingly p...
Schema matching is a critical problem for integrating heterogeneous information sources. Traditional...
A novel hashing algorithm is applied to match two prominent and important bibliographic databases at...
A novel hashing algorithm is applied to match two prominent and important bibliographic databases at...
Many modern applications of AI such as web search, mobile browsing, image processing, and natural la...
This paper presents a text matching process for identification and correct assignment of scholarly p...
Many modern applications of AI such as web search, mobile browsing, image processing, and natural la...
This paper reports initial research results related to the use of locality-sensitive hashing (LSH) ...
This paper presents a text matching process for identification and correct assignment of scholarly p...
This is an electronic reprint of the original article. This reprint may differ from the original in ...
Metagenomic studies produce large datasets that are estimated to grow at a faster rate than the avai...
This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to...
Research Doctorate - Doctor of Philosophy (PhD)This thesis presents techniques for accelerating simi...
Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradig...
Approximate Nearest Neighbor (ANN) search in high di-mensional space has become a fundamental paradi...
Abstract—Similarity search is critical for many database ap-plications, including the increasingly p...
Schema matching is a critical problem for integrating heterogeneous information sources. Traditional...