Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure ...
We propose a model-based clustering procedure where each component can take into account cluster-spe...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Outliers can be extremely harmful when applying well-known Cluster Analysis methods. More- over, clu...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Co...
Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique ...
Outlier identification is important in many applications of multivariate analysis. Either because th...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
Summary. Variable selection for clustering is an important and challenging problem in high-dimension...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Cluster analysis of binary data is a relatively poorly developed task in comparison with cluster ana...
The following mixture model-based clustering methods are compared in a simulation study with one-dim...
We propose a model-based clustering procedure where each component can take into account cluster-spe...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Outliers can be extremely harmful when applying well-known Cluster Analysis methods. More- over, clu...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biostatistics & Co...
Searching a dataset for the ‘‘natural grouping / clustering’’ is an important explanatory technique ...
Outlier identification is important in many applications of multivariate analysis. Either because th...
We introduce a robust k-means-based clustering method for high-dimensional data where not only outli...
We compare two major approaches to variable selection in clustering: model selection and regularizat...
Summary. Variable selection for clustering is an important and challenging problem in high-dimension...
The thesis tackles the problem of uncovering hidden structures in high-dimensional data in the prese...
Cluster analysis of binary data is a relatively poorly developed task in comparison with cluster ana...
The following mixture model-based clustering methods are compared in a simulation study with one-dim...
We propose a model-based clustering procedure where each component can take into account cluster-spe...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Outliers can be extremely harmful when applying well-known Cluster Analysis methods. More- over, clu...