Detection of near duplicate web pages using four stage algorithm | IEEE Conference Publication | IEEE Xplore