Skip to Main Content
This paper focuses on data preprocessing for WUM. Web page mining (WUM) applies data procedures to analyze user access of Well sites. As with any knowledge, discovery and data mining (KDD) process, WUM contains three main steps: preprocessing, knowledge extraction and results analysis. This data preprocessing try to determine the exact list of users who accessed the Web site and to reconstitute user sessions-the sequence of actions each user performed at the Web site. For privacy reasons, the preprocessing users use Web server log files from Web servers as well as the Website map and then anonymizing and joining log files are used. The data preprocessing involves data fusion, data cleaning, data structuration and data summarization. This data preprocessing not only reduces the log file size but also increases the quality of available data through the new data structures.