This paper focuses on large–scale unsupervised feature selection from text. We expand upon the recently proposed Compressive Feature Learning (CFL) framework, a method that uses dictionary-based compression to select a K-gram represen-tation for a document corpus. We show that CFL is NP–Complete and provide a novel and efficient approximation algorithm based on a homotopy that transforms a convex relaxation of CFL into the original problem. Our algorithm allows CFL to scale to corpuses comprised of millions of doc-uments because each step is linear in the corpus length and highly parallelizable. We use it to extract features from the BeerAdvocate dataset, a corpus of over 1.5 million beer reviews span-ning 10 years. CFL uses two orders of m...
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature se...
The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume d...
This thesis studies the generalization behavior of algorithms in Sample Compression Settings. It ext...
Feature selection methods are often applied in the context of document classification. They are part...
In text classification based on the bag-of-words (BoW) or similar representations, we usually have a...
Text classification plays an important role in various applications of big data by automatically cla...
This article considers "compressive learning," an approach to large-scale machine learning where dat...
The automatic discovery of a significant low-dimensional feature representation from a given data se...
AbstractÐIn this article, we describe an unsupervised feature selection algorithm suitable for data ...
Given the overwhelming quantities of data generated every day, there is a pressing need for tools th...
Application of a feature selection algorithm to a textual data set can improve the performance of so...
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of par...
The majority of machine learning research has been focused on building models and inference techniqu...
We present a new algorithm for large scale unsupervised text classification. Our method views eac...
With the rapid development of the Internet, the last decade has witnessed explosive growth in data. ...
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature se...
The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume d...
This thesis studies the generalization behavior of algorithms in Sample Compression Settings. It ext...
Feature selection methods are often applied in the context of document classification. They are part...
In text classification based on the bag-of-words (BoW) or similar representations, we usually have a...
Text classification plays an important role in various applications of big data by automatically cla...
This article considers "compressive learning," an approach to large-scale machine learning where dat...
The automatic discovery of a significant low-dimensional feature representation from a given data se...
AbstractÐIn this article, we describe an unsupervised feature selection algorithm suitable for data ...
Given the overwhelming quantities of data generated every day, there is a pressing need for tools th...
Application of a feature selection algorithm to a textual data set can improve the performance of so...
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of par...
The majority of machine learning research has been focused on building models and inference techniqu...
We present a new algorithm for large scale unsupervised text classification. Our method views eac...
With the rapid development of the Internet, the last decade has witnessed explosive growth in data. ...
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature se...
The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume d...
This thesis studies the generalization behavior of algorithms in Sample Compression Settings. It ext...