BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing

Li, Ping

Publication date

February 2014

Publisher

Banff International Research Station for Mathematical Innovation and Discovery

Abstract

Modern applications of search and learning have to deal with datasets with billions of examples in billion or even billion square dimensions (e.g., text documents represented by high-order n-grams). In this talk, we will first present the use of very sparse random projections (Li, Hastie, Church, KDD 2006) for learning with high-dimensional data. It is evident that the projection matrix can be extremely sparse (e.g., 0.1% or less nonzeros) without hurting the learning performance. For binary sparse data (which are common in practice), however, b-bit minwise hashing (Li and Konig, Communications of the ACM 2011) turns out to be much more efficient than random projections. In addition, the recent development of one-permutation hashing (Li, Ow...

Extracted data

We use cookies to provide a better user experience.

Data Protection

BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing

Abstract

Extracted data

BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing

Abstract

Extracted data

Related items

Related items