1Statistical Model Computation with UDFs

Carlos Ordonez

Publication date

January 2016

Abstract

AbstractStatistical models are generally computed outside a DBMS due to their mathematical complexity. We introduce techniques to efciently compute fundamental statistical models inside a DBMS exploiting User-Dened-Functions (UDFs). We study the computation of linear regression, PCA, clustering and Naive Bayes. Two summary matrices on the data set are mathematically shown to be essential for all models: the linear sum of points and the quadratic sum of cross-products of points. We consider two layouts for the input data set: horizontal and vertical. We rst introduce efcient SQL queries to compute summary matrices and to score the data set. Based on the SQL framework, we introduce UDFs that work in a single table scan: aggregate UDFs to comp...

Extracted data

We use cookies to provide a better user experience.

Data Protection

1Statistical Model Computation with UDFs

Abstract

Extracted data

1Statistical Model Computation with UDFs

Abstract

Extracted data

Related items

Related items