On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases

Goetz Graefe
Usama Fayyad
Surajit Chaudhuri

Publication date

January 1998

Abstract

For a wide variety of classification algorithms, scalability to large databases can be achieved by observing that most algorithms are driven by a set of sufficient statistics that are significantly smaller than the data. By relying on a SQL backend to compute the sufficient statistics, we leverage the query processing system of SQL databases and avoid the need for moving data to the client. We present a new SQL operator (Unpivot) that enables efficient gathering of statistics with minimal changes to the SQL backend. Our approach results in significant increase in performance without requiring any changes to the physical layout of the data. We show analytically how this approach outperforms an alternative that requires changing in the data l...

Extracted data

We use cookies to provide a better user experience.

Data Protection

On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases

Abstract

Extracted data

On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases

Abstract

Extracted data

Related items

Related items