Skip to Main Content
E-commerce applications generate and consume a tremendous amount of online information, which is typically available as textual documents. Conceivably, organizations and individuals generally use category sets or hierarchies to organize, archive, and access their documents. Meanwhile, organizations and individuals constantly acquire relevant documents from various Internet sources, each of which may organize its documents in a category set or hierarchy different from that used by the acquiring organization or individual. Consequently, the integration of source documents organized in a category hierarchy into an existing category hierarchy deployed by the acquiring organization or individual becomes an important issue in the e-commerce era. Existing category-integration techniques are mainly designed to integrate document catalogs, each of which is organized nonhierarchically (i.e., in a flat set). In this paper, we propose a clustering-based category-hierarchy integration (CHI) technique, which is an extension of the clustering-based category-integration (CCI) technique. Our empirical evaluation results show that the proposed CHI technique appears to improve the effectiveness of category-hierarchy integration compared with that attained by nonhierarchical category-integration techniques, particularly in homogeneous and comparable scenarios.