Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure ...
Summary. Variable selection for clustering is an important and challenging problem in high-dimension...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Cluster analysis of binary data is a relatively poorly developed task in comparison with cluster ana...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Co...
Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique ...
Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique ...
Outlier identification is important in many applications of multivariate analysis. Either because th...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
Summary. Variable selection for clustering is an important and challenging problem in high-dimension...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Cluster analysis of binary data is a relatively poorly developed task in comparison with cluster ana...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Co...
Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique ...
Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique ...
Outlier identification is important in many applications of multivariate analysis. Either because th...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
Summary. Variable selection for clustering is an important and challenging problem in high-dimension...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Cluster analysis of binary data is a relatively poorly developed task in comparison with cluster ana...