Close category search window
 

RPig: A scalable framework for machine learning and advanced statistical functionalities

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
MingXue Wang ; Network Manage. Lab., Ericsson Ireland, Ireland ; Handurukande, S.B. ; Nassar, M.

In many domains such as Telecom various scenarios necessitate the processing of large amounts of data using statistical and machine learning algorithms. A noticeable effort has been made to move the data management systems into MapReduce parallel processing environments such as Hadoop and Pig. Nevertheless these systems lack the features of advanced machine learning and statistical analysis. Frame-works such as Mahout on top of Hadoop support machine learning but their implementations are at the preliminary stage. For example Mahout does not provide Support Vector Machine (SVM) algorithms and it is difficult to use. On the other hand traditional statistical software tools such as R containing comprehensive statistical algorithms for advanced analysis are widely used. But such software can only run on a single computer and therefore it is not scalable. In this paper we propose an integrated solution RPig which takes the advantages of R (for machine learning and statistical analysis capabilities) and parallel data processing capabilities of Pig. The RPig framework offers a scalable advanced data analysis solution for machine learning and statistical analysis. Analysis jobs can be easily developed with RPig script in high level languages. We describe the design implementation and an eclipse-based RPigEditor for the RPig framework. Using application scenarios from the Telecom domain we show the usage of RPig and how the framework can significantly reduce the development effort. The results demonstrate the scalability of our framework and the simplicity of deployment for analysis jobs.

Published in:
Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on

Date of Conference: 3-6 Dec. 2012

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.