Conferences >2020 Indo – Taiwan 2nd Intern...

Crawler by Inference

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

With a million new pages getting added every single day, the already gigantic web is growing exponentially. While it challenges the search engine and traditional informat...Show More

Metadata

Abstract:

With a million new pages getting added every single day, the already gigantic web is growing exponentially. While it challenges the search engine and traditional information retrieval methods in producing the relevant results, so does the crawler, which does the background job of traversing the web with hyperlink structure to obtain the web snapshot. The traditional crawlers throw challenges of maintaining the right traversal data structure and tracking the already visited pages. Contemporary applications require context and domain-specific crawlers that harvest the right set of pages and data. A focused crawler needs to have domain-specific evaluation parameters to evaluate and crawl the right set of pages based on relevance. In this paper, we propose a novel model - Crawler by Inference to achieve the said objectives using semantic similarity, paradigmatic similarity, and rules of inference. The proposed methodology prioritizes the links based on the number of new rules built or discovered. The model proposes an efficient data structure - an intelligent queue, which holds the links on a priority basis. The resulting analysis data of a page can also act as a meta-data of the page. The paper also presents the results in comparison with the traditional crawler. The model promises to produce better results by avoiding the crawl of irrelevant pages.

Published in: 2020 Indo – Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN)

Date of Conference: 07-15 February 2020

Date Added to IEEE Xplore: 01 September 2020

ISBN Information:

DOI: 10.1109/Indo-TaiwanICAN48429.2020.9181364

Conference Location: Rajpura, India

Contents

References is not available for this document.

Crawler by Inference

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Crawler by Inference

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?