Loading [MathJax]/extensions/MathMenu.js
Design of Asynchronous Non-blocking Network Crawler Based on Node.js | IEEE Conference Publication | IEEE Xplore

Design of Asynchronous Non-blocking Network Crawler Based on Node.js


Abstract:

In order to improve the crawling efficiency of network crawler, make full use of CPU and realize the processing of high concurrency data, this paper proposes an asynchron...Show More

Abstract:

In order to improve the crawling efficiency of network crawler, make full use of CPU and realize the processing of high concurrency data, this paper proposes an asynchronous non-blocking network crawler design based on Node.js. It uses a single-threaded model to deal with concurrent data. When a client requests a connection, it triggers an internal event. Through the non-blocking I/O and event-driven mechanism, Node. js program can be parallelized macroscopically. At the same time, it constructs many asynchronous I/O APIs at the bottom, and there is no need to wait between each call. After the operation is completed, the data are processed through callback, so as to reduce the cost and complexity. It is used to write lightweight web crawlers. By acquiring the page, grasping the target content, storing it in JSON files and arrays, and then jumping to the next page, the crawling of web data is realized through several steps of obtaining detailed information. Through practical application test, it is very accurate and efficient to crawl nearly 1000 pieces of data from the webpage.
Date of Conference: 26-27 March 2022
Date Added to IEEE Xplore: 18 August 2023
ISBN Information:

ISSN Information:

Conference Location: Hengyang, China

Contact IEEE to Subscribe

References

References is not available for this document.