Data engineers are very interested in data lake technologies due to the incredible abun dance of datasets. They typically use clustering to understand the structure of the datasets before applying other methods to infer knowledge from them. This article presents the first proposal that explores how to use a meta-heuristic to address the problem of multi-way single-subspace automatic clustering, which is very appropriate in the context of data lakes. It was confronted with five strong competitors that combine the state-of-the-art attribute selection proposal with three classical single-way clustering proposals, a recent quantum-inspired one, and a recent deep-learning one. The evaluation focused on explor ing their ability to find comp...
Unsupervised learning is widely recognized as one of the most important challenges facing machine le...
We examine whether the quality of dierent clustering algorithms can be compared by a general, scient...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Data lakes for clustering ------------------------- These are the research materials that accompany...
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grou...
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, pri...
In this dissertation the problem of grouping a data set $mathcal{A}subsetmathbb{R}$ into $k$ disjunc...
Research on the problem of clustering tends to be fragmented across the pattern recognition, databas...
We develop a computer-assisted method for the discovery of insightful conceptualizations, in the for...
textIn classical clustering, each data point is assigned to at least one cluster. However, in many ...
Clustering methods are particularly well-suited for identifying classes in spatial databases. Howeve...
A vital issue in information grouping and present a few answers for it. We explore utilizing separat...
Scalable algorithm design has become central in the era of large-scale data analysis. The vast amoun...
A growing number of data-based applications are used for decision-making that have far-reaching cons...
We review the time and storage costs of search and clustering algorithms. We exemplify these, based ...
Unsupervised learning is widely recognized as one of the most important challenges facing machine le...
We examine whether the quality of dierent clustering algorithms can be compared by a general, scient...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Data lakes for clustering ------------------------- These are the research materials that accompany...
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grou...
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, pri...
In this dissertation the problem of grouping a data set $mathcal{A}subsetmathbb{R}$ into $k$ disjunc...
Research on the problem of clustering tends to be fragmented across the pattern recognition, databas...
We develop a computer-assisted method for the discovery of insightful conceptualizations, in the for...
textIn classical clustering, each data point is assigned to at least one cluster. However, in many ...
Clustering methods are particularly well-suited for identifying classes in spatial databases. Howeve...
A vital issue in information grouping and present a few answers for it. We explore utilizing separat...
Scalable algorithm design has become central in the era of large-scale data analysis. The vast amoun...
A growing number of data-based applications are used for decision-making that have far-reaching cons...
We review the time and storage costs of search and clustering algorithms. We exemplify these, based ...
Unsupervised learning is widely recognized as one of the most important challenges facing machine le...
We examine whether the quality of dierent clustering algorithms can be compared by a general, scient...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...