Given a written text in natural language, it is convenient to represent the information content of the text by one or more entities, variously known as concepts, keywords, or terms. It is desired to choose "good" terms which collectively reflect the information content as accurately as possible. A characterization is given of discriminating (good) and non-discriminating (bad) terms, based on the document frequencies of occurrence and the distribution of frequencies of the terms in the documents (texts) of a given document collection. Based on this characterization, reasons are presented for the success and/or the failure of some well-known indexing methods, namely thesaurus construction, "weighting" of the rare terms, and the deletion of no...
We present an evaluation of domainindependent natural language tools for use in the identification o...
Traditional index weighting approaches for information retrieval from texts depend on the term frequ...
An algorithm for document clustering is introduced. The base concept of the algorithm, Cover Coeffic...
This dissertation introduces a new theoretical model for text classification systems, including syst...
THe content analysis, or indexing problem, is fundamental in information storage and retrieval. Sev...
An attempt is made to characterize the usefulness of terms occurring in stored documents and user qu...
The advances in data collection and the increasing amount of unstructured and unlabeled text documen...
Abstract Index terms are an important component in considering a scientific topic. In a real sense, ...
The advances in data collection and the increasing amount of unstructured and unlabeledtext document...
The advances in data collection and the increasing amount of unstructured and unlabeledtext document...
Abstract Index terms are an important component in considering a scientific topic. In a real sense, ...
The occurrence of a word, one or more times, in a document is taken as an attribute of that document...
In this paper we propose an innovative method of categorizing text documents. The proposed method pr...
Topic indexing is the task of identifying the main topics covered by a document. These are useful fo...
The common view of the 'aboutness ' of documents is that the index entries (or classificat...
We present an evaluation of domainindependent natural language tools for use in the identification o...
Traditional index weighting approaches for information retrieval from texts depend on the term frequ...
An algorithm for document clustering is introduced. The base concept of the algorithm, Cover Coeffic...
This dissertation introduces a new theoretical model for text classification systems, including syst...
THe content analysis, or indexing problem, is fundamental in information storage and retrieval. Sev...
An attempt is made to characterize the usefulness of terms occurring in stored documents and user qu...
The advances in data collection and the increasing amount of unstructured and unlabeled text documen...
Abstract Index terms are an important component in considering a scientific topic. In a real sense, ...
The advances in data collection and the increasing amount of unstructured and unlabeledtext document...
The advances in data collection and the increasing amount of unstructured and unlabeledtext document...
Abstract Index terms are an important component in considering a scientific topic. In a real sense, ...
The occurrence of a word, one or more times, in a document is taken as an attribute of that document...
In this paper we propose an innovative method of categorizing text documents. The proposed method pr...
Topic indexing is the task of identifying the main topics covered by a document. These are useful fo...
The common view of the 'aboutness ' of documents is that the index entries (or classificat...
We present an evaluation of domainindependent natural language tools for use in the identification o...
Traditional index weighting approaches for information retrieval from texts depend on the term frequ...
An algorithm for document clustering is introduced. The base concept of the algorithm, Cover Coeffic...