Data lakes for clustering ------------------------- These are the research materials that accompany article "On Exploring Data Lakes by Finding Compact, Isolated Clusters", by Patricia Jiménez, Juan C. Roldán, and Rafael Corchuelo. This package includes the following: - "data-lakes": each subfolder corresponds to a data lake, and each CSV file inside a data-lake corresponds to a dataset. The last column of the datasets is called "clazz", but it is set to "0" in all cases. A few of the original datasets had a class, but it was removed to ensure that neither RóMULO nor the other competitors use it. - "results": it provides the results of testing RóMULO and other competitors on the previous data lakes. The results consist of several "*-r...
Abstract. Cluster analysis deals with the automatic discovery of the grouping of a set of patterns. ...
The data clustering is a common technique for statistical data analysis.The task is to group objects...
Finding compact and well-separated clusters in data sets is a challenging task. Most clustering algo...
Data engineers are very interested in data lake technologies due to the incredible abun dance of dat...
This package provides the software and the data required to perform the experimentation that accompa...
Research on the problem of clustering tends to be fragmented across the pattern recognition, databas...
Clustering algorithms divide data into meaningful or useful groups, called clusters, such that the i...
Clustering is a division of data into groups of similar objects. Representing the data by fewer clus...
Clustering Geo-Data Cubes (CGC) is a Python package to perform clustering analysis for multidimensio...
Cluster Analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneou...
Clustering or cluster analysis [5] is a method in unsupervised learning and one of the most used tec...
Recent advances in clustering have shown that ensuring a minimum separation between cluster centroid...
This research primarily focused on finding differences in various distancing methods used in the k-m...
This file contains a number of randomly generated datasets. The properties of each dataset are indi...
Clustering as an important unsupervised learning technique is widely used to discover the inherent s...
Abstract. Cluster analysis deals with the automatic discovery of the grouping of a set of patterns. ...
The data clustering is a common technique for statistical data analysis.The task is to group objects...
Finding compact and well-separated clusters in data sets is a challenging task. Most clustering algo...
Data engineers are very interested in data lake technologies due to the incredible abun dance of dat...
This package provides the software and the data required to perform the experimentation that accompa...
Research on the problem of clustering tends to be fragmented across the pattern recognition, databas...
Clustering algorithms divide data into meaningful or useful groups, called clusters, such that the i...
Clustering is a division of data into groups of similar objects. Representing the data by fewer clus...
Clustering Geo-Data Cubes (CGC) is a Python package to perform clustering analysis for multidimensio...
Cluster Analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneou...
Clustering or cluster analysis [5] is a method in unsupervised learning and one of the most used tec...
Recent advances in clustering have shown that ensuring a minimum separation between cluster centroid...
This research primarily focused on finding differences in various distancing methods used in the k-m...
This file contains a number of randomly generated datasets. The properties of each dataset are indi...
Clustering as an important unsupervised learning technique is widely used to discover the inherent s...
Abstract. Cluster analysis deals with the automatic discovery of the grouping of a set of patterns. ...
The data clustering is a common technique for statistical data analysis.The task is to group objects...
Finding compact and well-separated clusters in data sets is a challenging task. Most clustering algo...