Feature selection, l1 vs. l2 regularization, and rotational invariance

Andrew Y. Ng

Open link

Publication date

January 2004

DOI

10.1145/1015330.1015435

Citation count (estimate)

608

Abstract

We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn “well,”) grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L1 regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lowerbound showing that any rotationally invariant algorithm—including logistic regression with L2 re...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Feature selection, l1 vs. l2 regularization, and rotational invariance

Abstract

Extracted data

Feature selection, l1 vs. l2 regularization, and rotational invariance

Abstract

Extracted data

Related items

Related items