Loading [MathJax]/extensions/MathMenu.js
Automated Social Text Annotation With Joint Multilabel Attention Networks | IEEE Journals & Magazine | IEEE Xplore

Automated Social Text Annotation With Joint Multilabel Attention Networks


Abstract:

Automated social text annotation is the task of suggesting a set of tags for shared documents on social media platforms. The automated annotation process can reduce users...Show More

Abstract:

Automated social text annotation is the task of suggesting a set of tags for shared documents on social media platforms. The automated annotation process can reduce users' cognitive overhead in tagging and improve tag management for better search, browsing, and recommendation of documents. It can be formulated as a multilabel classification problem. We propose a novel deep learning-based method for this problem and design an attention-based neural network with semantic-based regularization, which can mimic users' reading and annotation behavior to formulate better document representation, leveraging the semantic relations among labels. The network separately models the title and the content of each document and injects an explicit, title-guided attention mechanism into each sentence. To exploit the correlation among labels, we propose two semantic-based loss regularizers, i.e., similarity and subsumption, which enforce the output of the network to conform to label semantics. The model with the semantic-based loss regularizers is referred to as the joint multilabel attention network (JMAN). We conducted a comprehensive evaluation study and compared JMAN to the state-of-the-art baseline models, using four large, real-world social media data sets. In terms of F1, JMAN significantly outperformed bidirectional gated recurrent unit (Bi-GRU) relatively by around 12.8%-78.6% and the hierarchical attention network (HAN) by around 3.9%-23.8%. The JMAN model demonstrates advantages in convergence and training speed. Further improvement of performance was observed against latent Dirichlet allocation (LDA) and support vector machine (SVM). When applying the semantic-based loss regularizers, the performance of HAN and Bi-GRU in terms of F1 was also boosted. It is also found that dynamic update of the label semantic matrices (JMANd) has the potential to further improve the performance of JMAN but at the cost of substantial memory and warrants further study.
Page(s): 2224 - 2238
Date of Publication: 25 June 2020

ISSN Information:

PubMed ID: 32584774

Funding Agency:


I. Introduction

Tagging is a popular approach to organize various resources on many social media platforms, which allows users to share and annotate resources with their own vocabularies. In academic social bookmarking systems, such as Bibsonomy (http://bibsonomy.org) and CiteULike (http://citeulike.org), tags are used to organize academic publications; on social question & answering (Q&A) sites, such as Quora (http://quora.com), StackOverFlow (https://stackoverflow.com), and Zhihu (https://zhihu.com/), tags are associated with questions for better search and recommendation; in microblogging services, such as Twitter (https://twitter.com), tags are in the form of hashtags to produce alternative access points to tweets. These accumulated tags are commonly referred to as Folksonomies, which have been used for organizing online resources [1], browsing [2], semantic-based search and recommendation [3], and learning knowledge structures [4]. It is also reported that tags have higher descriptive and discriminative power compared with other textual features, such as titles, descriptions, and comments, for document classification [5]. Fig. 1 shows an example of a published article and its associated tags on Bibsonomy.

Example of a document and its associated metadata and tags on Bibsonomy. The metadata consists of title and the content (i.e., abstract of this article). Tags are surrounded with a red box.

Contact IEEE to Subscribe

References

References is not available for this document.