Research and Simulation of Improved Topic Web Crawler Algorithm based on Deep Learning | IEEE Conference Publication | IEEE Xplore

Research and Simulation of Improved Topic Web Crawler Algorithm based on Deep Learning


Abstract:

This article analyzes the limitations of traditional topic crawlers, and on this basis, compares depth first and breadth first crawling strategies to construct an improve...Show More

Abstract:

This article analyzes the limitations of traditional topic crawlers, and on this basis, compares depth first and breadth first crawling strategies to construct an improved topic URL crawling strategy. By using regular expressions and web page selectors to locate actionable positions in a webpage, the program simulates human operations on the webpage based on these positions, in order to obtain more topic related URLs and webpage content. Finally, by establishing the experimental process of themed crawler, designing and improving the themed web crawler algorithm, and finally comparing and analyzing the experimental results, it is shown that the improved URL crawler strategy in this paper can greatly reduce the number of total urls crawled by crawler, reduce the crawling time, and improve the efficiency of unit crawler crawling target themed web pages.
Date of Conference: 16-17 June 2023
Date Added to IEEE Xplore: 09 August 2023
ISBN Information:
Conference Location: Dharwad, India

Contact IEEE to Subscribe

References

References is not available for this document.