Document image dataset indexing and compression using connected components clustering | IEEE Conference Publication | IEEE Xplore