Cart (Loading....) | Create Account
Close category search window
 

Statistical Model Computation with UDFs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Ordonez, C. ; Dept. of Comput. Sci. Houston, Univ. of Houston, Houston, TX, USA

Statistical models are generally computed outside a DBMS due to their mathematical complexity. We introduce techniques to efficiently compute fundamental statistical models inside a DBMS exploiting User-Defined Functions (UDFs). Specifically, we study the computation of linear regression, PCA, clustering, and Naive Bayes. Two summary matrices on the data set are mathematically shown to be essential for all models: the linear sum of points and the quadratic sum of cross products of points. We consider two layouts for the input data set: horizontal and vertical. We first introduce efficient SQL queries to compute summary matrices and score the data set. Based on the SQL framework, we introduce UDFs that work in a single table scan: aggregate UDFs to compute summary matrices for all models and a set of primitive scalar UDFs to score data sets. Experiments compare UDFs and SQL queries (running inside the DBMS) with C++ (analyzing exported files). In general, UDFs are faster than SQL queries and not much slower than C++. Considering export times, C++ is slower than UDFs and SQL queries. Statistical models based on precomputed summary matrices are computed in a few seconds. UDFs scale linearly and only require one table scan, highlighting their efficiency.

Published in:

Knowledge and Data Engineering, IEEE Transactions on  (Volume:22 ,  Issue: 12 )

Date of Publication:

Dec. 2010

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.