The paper describes a fault-tolerant method of selecting duplicate bibliographic records in catalogues. The method is based on the use of text algorithms; decisions are suggested to librarians who make the final decision. The method was applied to four library catalogues at the Warsaw University of Technology which were compared with the catalogue of the main library. Process of joining catalogues was conducted differently for non-duplicate records and for duplicate ones. Thanks to this method, a significant portion of records in the catalogues of the joining libraries had been found to be duplicate before the catalogues were added. The algorithms proved helpful in assuring high quality of information
This paper presents an evaluation of different methods for automatic duplicate detection in digitize...
Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookse...
With the rapid development of the World Wide Web, there are a huge number of fully or fragmentally d...
Περιέχει το πλήρες κείμενοPurpose - The purpose of this paper is to focus on duplicate record detect...
In this project it is shown that under certain conditions the checking for duplicates while loading ...
A framework is presented for discovering partial duplicates in large collections of scanned books wi...
Often, in the real world, entities have two or more representations in databases. Duplicate records ...
Duplicate records in the Online Union Catalog of the OCLC Online Computer Library Center, Inc., were...
Duplicate records in the Online Union Catalog of the OCLC Online Computer Library Center, Inc., were...
References are the main descriptive metadata used by digital libraries of scientific articles. These...
This thesis deals with the problematics of detecting documents, which are so similair one to another...
This thesis deals with the problematics of detecting documents, which are so similair one to another...
As the the amount of books available online the sizes of each these collections are at the same pace...
As the the amount of books available online the sizes of each these collections are at the same pace...
This paper introduces a framework for clarifying and formalizing the duplicate document detection pr...
This paper presents an evaluation of different methods for automatic duplicate detection in digitize...
Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookse...
With the rapid development of the World Wide Web, there are a huge number of fully or fragmentally d...
Περιέχει το πλήρες κείμενοPurpose - The purpose of this paper is to focus on duplicate record detect...
In this project it is shown that under certain conditions the checking for duplicates while loading ...
A framework is presented for discovering partial duplicates in large collections of scanned books wi...
Often, in the real world, entities have two or more representations in databases. Duplicate records ...
Duplicate records in the Online Union Catalog of the OCLC Online Computer Library Center, Inc., were...
Duplicate records in the Online Union Catalog of the OCLC Online Computer Library Center, Inc., were...
References are the main descriptive metadata used by digital libraries of scientific articles. These...
This thesis deals with the problematics of detecting documents, which are so similair one to another...
This thesis deals with the problematics of detecting documents, which are so similair one to another...
As the the amount of books available online the sizes of each these collections are at the same pace...
As the the amount of books available online the sizes of each these collections are at the same pace...
This paper introduces a framework for clarifying and formalizing the duplicate document detection pr...
This paper presents an evaluation of different methods for automatic duplicate detection in digitize...
Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookse...
With the rapid development of the World Wide Web, there are a huge number of fully or fragmentally d...