Loading [MathJax]/extensions/MathMenu.js
Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification | IEEE Journals & Magazine | IEEE Xplore

Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification


Abstract:

Hierarchical Text Classification (HTC) is an essential and challenging task due to the difficulty of modeling label hierarchy. Recent generative methods have achieved sta...Show More

Abstract:

Hierarchical Text Classification (HTC) is an essential and challenging task due to the difficulty of modeling label hierarchy. Recent generative methods have achieved state-of-the-art performance by flattening the local label hierarchy into a label sequence with a specific order. However, the order between labels does not naturally exist and the generation of the current label should incorporate the information in all other target labels. Moreover, the generative methods usually suffer from the error accumulation problem. To this end, we propose a new framework named sequence-to-label (Seq2Label) with a random generative way to learn label hierarchy for hierarchical text classification. Instead of using only one specific order, we shuffle the label sequence by a Label Sequence Random Shuffling (LSRS) mechanism so that a text will be mapped to several different order label sequences during the training phase. To alleviate the error accumulation problem, we further propose a Hierarchy-aware Negative Sampling (HNS) strategy with a negative label-aware loss to better distinguish target labels and negative labels. In this way, our model can capture the hierarchical and co-occurrence information of the target labels of each text. The experimental results on three benchmark datasets show that Seq2Label achieves state-of-the-art results.
Page(s): 276 - 285
Date of Publication: 01 November 2023

ISSN Information:

Funding Agency:


I. Introduction

Hierarchical text classification (HTC) is an important subtask of a multi-label text classification (MLC) [1], which is widely used in the news classification [2], advertising systems [3], information retrieval [4], fine-grained entity typing [5], etc. Different from MLC, HTC aims to assign each document to one or more node-paths from a taxonomic hierarchy structure. The taxonomic hierarchy structure is always represented as a tree or a directed acyclic graph [6], as depicted in Fig. 1.

Contact IEEE to Subscribe

References

References is not available for this document.