Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method

Yi, Soe Lai

Publication date

December 2010

Abstract

Record matching is the task of identifying records that match the same real world entity. Detecting data records that are approximate duplicates, is an important task. Datasets may contain duplicate records concerning the same real-world entity because of data entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple databases. This paper describes a record matching algorithm, is based on the multi-pass sorted neighborhood method for publication datasets. It also detects data duplication over publication xml database, produces a higher percentage of correct duplicates and a lower percentage of false positive, on multiple key sorting pass. Multi-pass approach is used, which is based on the co...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method

Abstract

Extracted data

Record Matching System for Publication Dataset using Multi-pass Sorted Neighborhood Method

Abstract

Extracted data

Related items

Related items