Skip to Main Content
In order to solve some existing problems in traditional data preprocessing technology for web log mining, an improved data preprocessing technology is used in this article. The identification strategy based on the referred web page is adopted at the stage of user identification, which is more effective than the traditional one based on web site topology. At stage of Session Identification, the strategy based on fixed priori threshold combined with session reconstruction is introduced. First, the initial session set is developed by the method of fixed priori threshold, and then the initial session set is optimized by using session reconstruction. Experiments have proved that advanced data preprocessing technology can enhance the quality of data preprocessing results.