Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space. The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution i...
Fig. 1: The TopicView user interface. At left, the Conceptual Content panel presents a Term Table wi...
By allowing judgments based on a small number of exemplar documents to be applied to a larger number...
Sorting through a large set of documents when only some are relevant to the topic at hand is a centr...
The volume of textual information that we encounter on a daily ba-sis continues to grow at an impres...
Conventionally, document classification researches focus on improving the learning capabilities of c...
A wide variety of text analysis applications are based on statistical machine learning techniques. T...
This study proposes and evaluates a document analysis strategy for information retrieval with visua...
In the TREC collection -- a large full-text experimental text collection with widely varying documen...
Texts vary not only by topic, but by style; indeed, often the variation between texts `about the sa...
Abstract. The current availability of information many times impair the tasks of searching, browsing...
Documents are usually represented in the bag-of-word space. However, this representation does not ta...
covers the implementation of software that aims to identify document versions and se-mantically rela...
In the course of eight TREC Conferences, retrieval performance of all systems started high and then ...
Document similarity measures are crucial components of many text-analysis tasks, including informati...
Acquaintance is the name of a novel vector-space n-gram technique for categorizing documents. The te...
Fig. 1: The TopicView user interface. At left, the Conceptual Content panel presents a Term Table wi...
By allowing judgments based on a small number of exemplar documents to be applied to a larger number...
Sorting through a large set of documents when only some are relevant to the topic at hand is a centr...
The volume of textual information that we encounter on a daily ba-sis continues to grow at an impres...
Conventionally, document classification researches focus on improving the learning capabilities of c...
A wide variety of text analysis applications are based on statistical machine learning techniques. T...
This study proposes and evaluates a document analysis strategy for information retrieval with visua...
In the TREC collection -- a large full-text experimental text collection with widely varying documen...
Texts vary not only by topic, but by style; indeed, often the variation between texts `about the sa...
Abstract. The current availability of information many times impair the tasks of searching, browsing...
Documents are usually represented in the bag-of-word space. However, this representation does not ta...
covers the implementation of software that aims to identify document versions and se-mantically rela...
In the course of eight TREC Conferences, retrieval performance of all systems started high and then ...
Document similarity measures are crucial components of many text-analysis tasks, including informati...
Acquaintance is the name of a novel vector-space n-gram technique for categorizing documents. The te...
Fig. 1: The TopicView user interface. At left, the Conceptual Content panel presents a Term Table wi...
By allowing judgments based on a small number of exemplar documents to be applied to a larger number...
Sorting through a large set of documents when only some are relevant to the topic at hand is a centr...