Abstract:
Every individual is provided with access to plenty of information with the help of World Wide Web, but it becomes progressively more difficult to discover the significant...Show MoreMetadata
Abstract:
Every individual is provided with access to plenty of information with the help of World Wide Web, but it becomes progressively more difficult to discover the significant pieces of information. In web mining tries to tackle this problem by applying data mining techniques to Web data and documents. The data available on the web is so heterogeneous and huge that it becomes a crucial factor to extract this accessible data to make it pertinent to a particular problem. Web mining uses data mining techniques to extract knowledge from web sources. This paper focuses on detecting and extracting templates from web pages that are heterogeneous in nature by means of an algorithm. Locality sensitive hashing finds the similarity between the input web documents and provides good performance compared to the Minimum Description Length (MDL) principle and hash cluster process in terms of execution time.
Date of Conference: 13-14 February 2014
Date Added to IEEE Xplore: 08 September 2014
ISBN Information: