This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine providing strategies to automatically tune the algorithm parameters for the whole text clustering process. A data weighting strategy (e.g., TF-IDF) and a transformation method of input data (e.g., LSI) is explored before performing the cluster analysis to reduce sparseness, and make the overall analysis problem more eectively tractable. PASTA includes auto-selection strategies to o-load the end-user from parameter tuning and achieve a good quality of the clustering results. PASTA's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, PASTA has been validated on three collections of Wiki...
The methods available for structuring the collections are: Classification methods and clustering me...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In this work we argue towards a new self-learning engine able to suggest to the analyst good transfo...
In this paper we propose a new self-learning engine to streamline the analytics process, as it enabl...
textClustering is a central problem in unsupervised learning for discovering interesting patterns in...
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
ii Cluster analysis refers to a family of procedures which are fundamentally concerned with automati...
This chapter describes a novel multistage method for linguistic clustering of large collections of t...
Some simple processing techniques have allowed the application of a standard software package to the...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
Text or document clustering is a subset of a larger field of data clustering and has been one of the...
Abstract- The more number of documents stored in digitally, like as journals, e-books, bulletins and...
Text clustering problem (TCP) is a leading process in many key areas such as information retrieval, ...
The methods available for structuring the collections are: Classification methods and clustering me...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In this work we argue towards a new self-learning engine able to suggest to the analyst good transfo...
In this paper we propose a new self-learning engine to streamline the analytics process, as it enabl...
textClustering is a central problem in unsupervised learning for discovering interesting patterns in...
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim...
Document clustering is a very hard task in automatic text processing since it requires extracting re...
ii Cluster analysis refers to a family of procedures which are fundamentally concerned with automati...
This chapter describes a novel multistage method for linguistic clustering of large collections of t...
Some simple processing techniques have allowed the application of a standard software package to the...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
Text or document clustering is a subset of a larger field of data clustering and has been one of the...
Abstract- The more number of documents stored in digitally, like as journals, e-books, bulletins and...
Text clustering problem (TCP) is a leading process in many key areas such as information retrieval, ...
The methods available for structuring the collections are: Classification methods and clustering me...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...