We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of search queries in Apache Lucene format. Clusters are formed as the set of documents matched by a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query). Where queries contain more than one word, we have found it useful to assign one word to be the root and constrain the query construction such that the set of documents returned by any additional query words intersect with the set returned by the root word. Multiword queries are interpreted disjunctively. We also describe how a gene can be used to det...
Clustering is an unsupervised machine learning technique, which involves discovering different clust...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
Web search result clustering aims to facilitate information search on the Web. Rather than the resul...
Search queries define a set of documents located in a collection and can be used to rank the documen...
Document clustering techniques have been widely applied in Information Retrieval to reorganize resul...
The goal of clustering web search results is to reveal the semantics of the retrieved documents. The...
International audienceDocument clustering techniques have been widely applied in Information Retriev...
People use web search engines to fill a wide variety of navigational, informational and transactiona...
AbstractIn this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) ...
Web users are demanding more out of current search engines. This can be noticed by the behaviour of ...
AbstractWe propose an evolutionary approach based on genetic algorithm for text document clustering....
In a world flooded with information, document clustering is an important tool that can help categori...
Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in ...
In this article, we use a genetic algorithm to evolve seven different types of Lucene search query ...
We describe a method for generating accurate, compact, human understandable text classifiers. Text ...
Clustering is an unsupervised machine learning technique, which involves discovering different clust...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
Web search result clustering aims to facilitate information search on the Web. Rather than the resul...
Search queries define a set of documents located in a collection and can be used to rank the documen...
Document clustering techniques have been widely applied in Information Retrieval to reorganize resul...
The goal of clustering web search results is to reveal the semantics of the retrieved documents. The...
International audienceDocument clustering techniques have been widely applied in Information Retriev...
People use web search engines to fill a wide variety of navigational, informational and transactiona...
AbstractIn this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) ...
Web users are demanding more out of current search engines. This can be noticed by the behaviour of ...
AbstractWe propose an evolutionary approach based on genetic algorithm for text document clustering....
In a world flooded with information, document clustering is an important tool that can help categori...
Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in ...
In this article, we use a genetic algorithm to evolve seven different types of Lucene search query ...
We describe a method for generating accurate, compact, human understandable text classifiers. Text ...
Clustering is an unsupervised machine learning technique, which involves discovering different clust...
Documents Clustering is a technique in which relationships between sets of documents are being autom...
Web search result clustering aims to facilitate information search on the Web. Rather than the resul...