Crawler by Inference | IEEE Conference Publication | IEEE Xplore

Abstract:

With a million new pages getting added every single day, the already gigantic web is growing exponentially. While it challenges the search engine and traditional informat...Show More

Abstract:

With a million new pages getting added every single day, the already gigantic web is growing exponentially. While it challenges the search engine and traditional information retrieval methods in producing the relevant results, so does the crawler, which does the background job of traversing the web with hyperlink structure to obtain the web snapshot. The traditional crawlers throw challenges of maintaining the right traversal data structure and tracking the already visited pages. Contemporary applications require context and domain-specific crawlers that harvest the right set of pages and data. A focused crawler needs to have domain-specific evaluation parameters to evaluate and crawl the right set of pages based on relevance. In this paper, we propose a novel model - Crawler by Inference to achieve the said objectives using semantic similarity, paradigmatic similarity, and rules of inference. The proposed methodology prioritizes the links based on the number of new rules built or discovered. The model proposes an efficient data structure - an intelligent queue, which holds the links on a priority basis. The resulting analysis data of a page can also act as a meta-data of the page. The paper also presents the results in comparison with the traditional crawler. The model promises to produce better results by avoiding the crawl of irrelevant pages.
Date of Conference: 07-15 February 2020
Date Added to IEEE Xplore: 01 September 2020
ISBN Information:
Conference Location: Rajpura, India

Contact IEEE to Subscribe

References

References is not available for this document.