Abstract:
Link Context Analysis has been widely explored for determining the context of the target web page. But most of the researchers have only considered descriptive or meaning...Show MoreMetadata
Abstract:
Link Context Analysis has been widely explored for determining the context of the target web page. But most of the researchers have only considered descriptive or meaningful anchor text and left the undiscriptive anchor text. By researching the World Wide Web it is analyzed that a good percentage of web pages can be reached by following the undescriptive anchor text. So an algorithm has been proposed and implemented for Link context determination (LCD) to determine the context of non-descriptive anchor text in this paper. In this work non-descriptive anchor text are mainly considered for Link Context determination. A corpus of different web pages belonging to a common domain has been considered first. Then the pages were manually analyzed and relation between the anchor text and the words in its vicinity were discovered. Certain numbers of rules were formed and represented in the form of a tree, based upon these relationships. In our proposed and implemented architecture for LCD we have used three components(1) Stanford parser (2) Rules (3) Link Context Determination. The input sentence is given to the Stanford parser which creates a parse tree for the read sentence. This tree is then used by the link context determiner along with the appropriate rules tree to determine the link context. The proposed approach has been implemented and validated by considering limited samples of non-descriptive ATs. The results have shown that, the proposed LCD has extracted 100% actual link-context of each considered non-descriptive Anchor Text (AT's).
Published in: Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization
Date of Conference: 08-10 October 2014
Date Added to IEEE Xplore: 22 January 2015
ISBN Information: