A Web Page De-duplication Algorithm Based on Data Clearing | IEEE Conference Publication | IEEE Xplore