Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally w...
In this paper, we address exploratory analysis of textual data streams and we propose a bootstrappin...
This paper describes a text mining tool that performs two tasks, namely document clustering and text...
We describe a large scale system for clustering a stream of news articles that was developed as part...
Text classification is a major data mining task. An advanced text classification technique is known ...
Detecting and using bursty patterns to analyze text streams has been one of the fundamental approach...
Mining retrospective events from text streams has been an important research topic. Classic text rep...
Thousands of documents are made available to the users via the web on a daily basis. One of the most...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
Bursty features in text streams are very useful in many text mining applications. Most existing stud...
Real-world events of general interest trigger engaging discussions among peoplefor short bursts in t...
A fundamental problem in text data mining is to extract meaningful structure from document streams ...
Text mining, in particular the clustering is mostly used by search engines to increase the recall an...
Many document collections are by nature dynamic, evolving as the topics or events they describe chan...
Being high-dimensional and relevant in semantics, text clustering is still an important topic in dat...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In this paper, we address exploratory analysis of textual data streams and we propose a bootstrappin...
This paper describes a text mining tool that performs two tasks, namely document clustering and text...
We describe a large scale system for clustering a stream of news articles that was developed as part...
Text classification is a major data mining task. An advanced text classification technique is known ...
Detecting and using bursty patterns to analyze text streams has been one of the fundamental approach...
Mining retrospective events from text streams has been an important research topic. Classic text rep...
Thousands of documents are made available to the users via the web on a daily basis. One of the most...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
Bursty features in text streams are very useful in many text mining applications. Most existing stud...
Real-world events of general interest trigger engaging discussions among peoplefor short bursts in t...
A fundamental problem in text data mining is to extract meaningful structure from document streams ...
Text mining, in particular the clustering is mostly used by search engines to increase the recall an...
Many document collections are by nature dynamic, evolving as the topics or events they describe chan...
Being high-dimensional and relevant in semantics, text clustering is still an important topic in dat...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In this paper, we address exploratory analysis of textual data streams and we propose a bootstrappin...
This paper describes a text mining tool that performs two tasks, namely document clustering and text...
We describe a large scale system for clustering a stream of news articles that was developed as part...