Because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. However, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. I have developed a script that uses Wikipedia as context for analyzing the subjects of nonfiction books. Though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research
The core of this experiment is the use of the entity-fishing algorithm, as created and deployed by D...
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production i...
© 2017 Elsevier Inc. A traditional classification approach based on keyword matching represents each...
Because the rate at which documents are being generated outstrips librarians’ ability to catalog the...
outlines three types of strategies for subject analysis: simplistic, content-oriented, and require-m...
The exponential growth of text documents available on the Internet has created an urgent need for ac...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Wikipedia is a goldmine of information. Each article describes a single concept, and together they c...
A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation ef...
There are many opportunities to improve the interactivity of information retrieval systems beyond th...
Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics i...
Producing large language corpora is not only highly work-intensive, but also increasingly a process ...
Among the manifold takes on world literature, it is our goal to contribute to the discussion from a ...
A scientific vocabulary is a set of terms that designate scientific concepts. This set of lexical un...
In traditional text clustering methods, documents are represented as “bags of words ” without consid...
The core of this experiment is the use of the entity-fishing algorithm, as created and deployed by D...
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production i...
© 2017 Elsevier Inc. A traditional classification approach based on keyword matching represents each...
Because the rate at which documents are being generated outstrips librarians’ ability to catalog the...
outlines three types of strategies for subject analysis: simplistic, content-oriented, and require-m...
The exponential growth of text documents available on the Internet has created an urgent need for ac...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
Wikipedia is a goldmine of information. Each article describes a single concept, and together they c...
A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation ef...
There are many opportunities to improve the interactivity of information retrieval systems beyond th...
Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics i...
Producing large language corpora is not only highly work-intensive, but also increasingly a process ...
Among the manifold takes on world literature, it is our goal to contribute to the discussion from a ...
A scientific vocabulary is a set of terms that designate scientific concepts. This set of lexical un...
In traditional text clustering methods, documents are represented as “bags of words ” without consid...
The core of this experiment is the use of the entity-fishing algorithm, as created and deployed by D...
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production i...
© 2017 Elsevier Inc. A traditional classification approach based on keyword matching represents each...