In this work we argue towards a new self-learning engine able to suggest to the analyst good transformation methods and weighting schemas for a given data collection. This new generation of systems, named SELF-DATA (SELF-learning DAta TrAnsformation) relies on an engine capable of exploring different data weighting schemas (e.g., normalized term frequencies, logarithmic entropy) and data transformation methods (e.g., PCA, LSI) before applying a given data mining algorithm (e.g., cluster analysis), evaluating and comparing solutions through different quality indices (e.g., weighted Silhouette), and presenting the 3-top solutions to the analyst. SELF-DATA will also include a knowledge database storing results of experiments on previously proc...
Data mining refers to extract and identify useful information from large sets of data. This term is ...
This edited book focuses on the latest developments in classification, statistical learning, data an...
This edited book focuses on the latest developments in classification, statistical learning, data an...
In this paper we propose a new self-learning engine to streamline the analytics process, as it enabl...
Large volumes of data are being collected at an ever increasing rate in various modern applications,...
This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine prov...
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim...
There has been an explosion in unstructured text data in recent years with services like Twitter, Fa...
Summarization: Data mining is an interdisciplinary subfield of computer science. It forms the comput...
Proliferation of the World Wide Web has massively increased the availability of textual data in rece...
Application of different clustering techniques can result in different basic data set partitions emp...
Cluster analysis of textual documents is a common technique for better ltering, navigation, under-st...
Unsupervised learning has important applications in extremely large data settings such as in medical...
Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, ...
As the amount and variety of data increases through technological and investigative advances, the me...
Data mining refers to extract and identify useful information from large sets of data. This term is ...
This edited book focuses on the latest developments in classification, statistical learning, data an...
This edited book focuses on the latest developments in classification, statistical learning, data an...
In this paper we propose a new self-learning engine to streamline the analytics process, as it enabl...
Large volumes of data are being collected at an ever increasing rate in various modern applications,...
This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine prov...
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim...
There has been an explosion in unstructured text data in recent years with services like Twitter, Fa...
Summarization: Data mining is an interdisciplinary subfield of computer science. It forms the comput...
Proliferation of the World Wide Web has massively increased the availability of textual data in rece...
Application of different clustering techniques can result in different basic data set partitions emp...
Cluster analysis of textual documents is a common technique for better ltering, navigation, under-st...
Unsupervised learning has important applications in extremely large data settings such as in medical...
Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, ...
As the amount and variety of data increases through technological and investigative advances, the me...
Data mining refers to extract and identify useful information from large sets of data. This term is ...
This edited book focuses on the latest developments in classification, statistical learning, data an...
This edited book focuses on the latest developments in classification, statistical learning, data an...