An attempt is made to characterize the usefulness of terms occurring in stored documents and user queries as a function of their frequency characteristics across the documents of a collection. It is found that the best terms are those having medium frequency in the collection and skewed frequency distributions. Correspondingly, terms exhibiting either very high or very low document frequency are not as useful. To improve the indexing vocabulary, it becomes necessary to group low frequency terms into classes, and to break up high frequency terms by forming phrases. An indexing theory is described based on term frequency considerations, and a new phrase generation method is introduced. The resulting improvements in the indexing vocabulary are...
In order for an automatic information retrieval system to effectively retrieve documents related to...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
THe content analysis, or indexing problem, is fundamental in information storage and retrieval. Sev...
Given a written text in natural language, it is convenient to represent the information content of t...
Traditional index weighting approaches for information retrieval from texts depend on the term frequ...
Abstract Index terms are an important component in considering a scientific topic. In a real sense, ...
The performance of information retrieval systems can be evaluated in a number of different ways. Mu...
Most existing automatic content analysis and indexing techniques are based on word frequency charac...
The common view of the 'aboutness ' of documents is that the index entries (or classificat...
Vocabulary incompatibilities arise when the terms used to index a document collection are largely un...
Vocabulary incompatibilities arise when the terms used to index a document collection are largely u...
frenchQcs.virginia.edu apowellOcnri.reston.va.us Vocabulary incompatibilities arise when the terms u...
The IR society has made efforts in free-term indexing for a long time. By contrast, few efforts are ...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
In order for an automatic information retrieval system to effectively retrieve documents related to...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
THe content analysis, or indexing problem, is fundamental in information storage and retrieval. Sev...
Given a written text in natural language, it is convenient to represent the information content of t...
Traditional index weighting approaches for information retrieval from texts depend on the term frequ...
Abstract Index terms are an important component in considering a scientific topic. In a real sense, ...
The performance of information retrieval systems can be evaluated in a number of different ways. Mu...
Most existing automatic content analysis and indexing techniques are based on word frequency charac...
The common view of the 'aboutness ' of documents is that the index entries (or classificat...
Vocabulary incompatibilities arise when the terms used to index a document collection are largely un...
Vocabulary incompatibilities arise when the terms used to index a document collection are largely u...
frenchQcs.virginia.edu apowellOcnri.reston.va.us Vocabulary incompatibilities arise when the terms u...
The IR society has made efforts in free-term indexing for a long time. By contrast, few efforts are ...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
In order for an automatic information retrieval system to effectively retrieve documents related to...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...
International audienceIn this paper, we review statistical techniques for the direct evaluation of d...