Abstract:
Under the background of big data, data mining and full utilization have become necessary ways for enterprises to improve their competitiveness. How to quickly and accurat...Show MoreMetadata
Abstract:
Under the background of big data, data mining and full utilization have become necessary ways for enterprises to improve their competitiveness. How to quickly and accurately obtain the data that meets the needs has become the key to data mining. At present, the main method of data acquisition is web crawlers. In order to improve the performance of the crawler system and achieve accurate and efficient data acquisition, this article tracks and studies the latest technical methods and introduces distributed web crawler technology. Based on this, the distributed crawler methods and crawler strategies based on Scrapy-Redis, cloud platform, and Nutch are analyzed. The results show that the use of distributed crawlers has obvious advantages in obtaining large-scale Web data. In order to compare the effect of distributed crawlers on large-scale data acquisition, verification experiments of three distributed crawler methods were carried out. The experimental results confirmed the effectiveness of distributed crawler technology in Web data acquisition.
Date of Conference: 11-14 December 2020
Date Added to IEEE Xplore: 12 February 2021
ISBN Information: