Conferences >2017 IEEE International Confe...

Cloud Based Web Scraping for Big Data Applications

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

With the penetration of new technologies, there is a rapid growth of internet users and data (mostly unstructured) generated by those users on the internet. As scraping i...Show More

Metadata

Abstract:

With the penetration of new technologies, there is a rapid growth of internet users and data (mostly unstructured) generated by those users on the internet. As scraping is one of the major sources for extraction of unstructured data from the Internet, we have analyzed the scraping process when introduced to a bulk of data extraction. We faced several challenges while scraping large amount of data, such as encountering captcha, storage issue for a large volume of data, need for intensive computation capacity and reliability of data extraction. In this paper, we investigate cloud-based web scraping architecture able to handle storage and computing resources with elasticity on demand using Amazon Web Services(Elastic Compute Cloud and DynamoDB). Our solution tries to address both scraping and feasibility for big data applications in a single cloud-based architecture for data-based industries. We discuss selenium as one of our tool for web scraping because of web drivers it supports which simulates a real user working with a browser. We also analyze the scalability and performance of the proposed cloud-based scrapper and describe the advantages of the proposed cloud-based scraping over other cloud-based scrapers.

Published in: 2017 IEEE International Conference on Smart Cloud (SmartCloud)

Date of Conference: 03-05 November 2017

Date Added to IEEE Xplore: 23 November 2017

ISBN Information:

DOI: 10.1109/SmartCloud.2017.28

Conference Location: New York, NY, USA

Contents

References is not available for this document.

Cloud Based Web Scraping for Big Data Applications

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cloud Based Web Scraping for Big Data Applications

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?