The ubiquity of high-dimensional data in machine learning and data mining applications makes its efficient indexing and retrieval from main memory crucial. Frequently, these machine learning algorithms need to query specific characteristics of single multidimensional points. For example, given a clustered dataset, the cluster membership (CM) query retrieves the cluster to which an object belongs. To efficiently answer this type of query we have developed STATS, a novel main-memory index which scales to answer CM queries on increasingly big datasets. Current indexing methods are oblivious to the structure of clusters in the data, and we thus, develop STATS around the key insight that exploiting the cluster information when indexing and prese...
Abstract The notorious “dimensionality curse ” is a wellknown phenomenon for any multi-dimensional i...
This is an Open Access article published under a Creative Commons Attribution 3.0 Unported (CC BY 3....
cluster analysis of data with anywhere from a few dozens to many thousands of dimensions. High-dimen...
We review the time and storage costs of search and clustering algorithms. We exemplify these, based ...
The emergence of novel database applications has resulted in the prevalence of a new paradigm for si...
We consider the problem of finding efficiently a high quality k-clustering of n points in a (possibl...
Indexing high dimensional data has its utility in many real world applications. Especially the infor...
The notorious iodimensionality curseln is a well-known phenomenon for any multi-dimensional indexes ...
A very promising idea for fast searching in traditional and multimedia databases is to map objects i...
More and more data are produced every day. Some clustering techniques have been developed to automat...
The k-means clustering algorithm has a long history and a proven practical performance, however it d...
The main focus of my research is to design effective learning techniques for information retrieval a...
Searching in a dataset for elements that are similar to a given query element is a core problem in a...
A very promising idea for fast searching in traditional and multimedia databases is to map objects i...
The notorious "dimensionality curse" is a well-known phenomenon for any multi-dimensional indexes at...
Abstract The notorious “dimensionality curse ” is a wellknown phenomenon for any multi-dimensional i...
This is an Open Access article published under a Creative Commons Attribution 3.0 Unported (CC BY 3....
cluster analysis of data with anywhere from a few dozens to many thousands of dimensions. High-dimen...
We review the time and storage costs of search and clustering algorithms. We exemplify these, based ...
The emergence of novel database applications has resulted in the prevalence of a new paradigm for si...
We consider the problem of finding efficiently a high quality k-clustering of n points in a (possibl...
Indexing high dimensional data has its utility in many real world applications. Especially the infor...
The notorious iodimensionality curseln is a well-known phenomenon for any multi-dimensional indexes ...
A very promising idea for fast searching in traditional and multimedia databases is to map objects i...
More and more data are produced every day. Some clustering techniques have been developed to automat...
The k-means clustering algorithm has a long history and a proven practical performance, however it d...
The main focus of my research is to design effective learning techniques for information retrieval a...
Searching in a dataset for elements that are similar to a given query element is a core problem in a...
A very promising idea for fast searching in traditional and multimedia databases is to map objects i...
The notorious "dimensionality curse" is a well-known phenomenon for any multi-dimensional indexes at...
Abstract The notorious “dimensionality curse ” is a wellknown phenomenon for any multi-dimensional i...
This is an Open Access article published under a Creative Commons Attribution 3.0 Unported (CC BY 3....
cluster analysis of data with anywhere from a few dozens to many thousands of dimensions. High-dimen...