Improving Health Mention Classification of Social Media Content Using Contrastive Adversarial Training

Health mention classification (HMC) involves the classification of an input text as health mention or not. Figurative and non-health mention of disease words makes the classification task challenging. Learning the context of the input text is the key to this problem. The idea is to learn word representation by its surrounding words and utilize emojis in the text to help improve the classification results. In this paper, we improve the word representation of the input text using adversarial training that acts as a regularizer during fine-tuning of the model. We generate adversarial examples by perturbing the word embeddings of the model and then train the model on a pair of clean and adversarial examples. Additionally, we utilize contrastive loss that tries to learn similar representations for the clean example and its perturbed version. We train and evaluate the method on three public datasets. Experiments show that contrastive adversarial training improves the performance significantly in terms of F1-score over the baseline methods of both BERTLarge and RoBERTaLarge on all three datasets. Furthermore, we provide a brief analysis of the results by utilizing the power of explainable AI.


I. INTRODUCTION
Health mention classification (HMC) deals with the classification of a given piece of text as health mention or not. This helps in the early detection and tracking of a pandemic which enables health departments and authorities in managing the resources and controlling the situation. The input text is gathered from the social media platforms such as Twitter, Facebook, Reddit, etc. The collection process involves crawling the aforementioned platforms based on keywords containing disease names. The keyword-based data collection does not consider the context of the text and hence contains irrelevant data. For example, a tweet ''I made such a great bowl of soup I think I cured my own depression'' 1 contains a disease of ''depression'', but this is used figuratively. Another tweet The associate editor coordinating the review of this manuscript and approving it for publication was Nadeem Iqbal. 1 This tweet is taken from Twitter ''Hearing people cough makes me angry. I cannot explain it'' 1 contains ''cough'' in it, but this does not show that a person is having a cough. Non-health and a figurative mention of disease words in these cases pose challenges to the HMC task. So, the question arises of how to address these challenges? One way is to consider surrounding words of the disease words that will give the context of the text. Another way is to leverage the emojis in the text as figurative mentioning text may contain smileys, whereas the actual disease mentioning text may contain emojis of sad faces, etc. Transformer methods [1] are good at capturing the contextual meanings of the words and have shown success in many natural language processing (NLP) tasks. BERT [2] is a transformer model pre-trained on a large unlabelled text corpus for language understanding, and can be fine-tuned on downstream tasks such as text classification [3]. It considers the words on the left and right sides of a given word while learning a representation for it. In this way, it achieves the contextual representation of a given word. BERT randomly masks 15% of the tokens in the corpus and then tries to predict masked tokens during the training process. RoBERTa [4] is an improvement over the BERT using dynamic masking of words instead of static 15% masking of the words. Further, it is trained on 1000% more data than BERT. Existing health mention classification tasks use both non-contextual, and contextual representations for the given text [5], [6], [7], [8], [9]. However, contextual representations have improved the performance of the classifier over non-contextual representations. Some methods use emojis present in the tweet text for the classification task. [5] extracts the sentiment information from the given tweet and passes it as an additional feature with textual features. [9] converts emojis into text using Python library 2 and then utilizes this emoji text as a part of tweet text.
Adversarial training (AT) [10] works as a regularizer and improves the robustness of the model against adversarial examples. The key idea is to add a gradient-based perturbation to the input examples, and then train the model on both clean and perturbed examples. In contrast to images, this technique is not directly applicable to text data. [11] applies perturbations to word embeddings for the task of text classification. [12] utilizes a contrastive loss for learning features in computer vision (CV). The idea is, that the input image is perturbed by adding some augmentation, and during training contrastive loss pushes both clean and augmented examples together while it pushes other examples away from these examples. Contrastive loss helps the model learn noise-invariant image feature representation. [13] proposes contrastive adversarial for text classification that improves the performance over the baseline methods. In this work, we propose contrastive adversarial training on the task of HMC, additionally using contrastive loss during the fine-tuning of the two transformer models. Specifically, we add perturbation to the embedding matrix of BERT and RoBERTa using Fast Gradient Sign Method (FGSM) [10]. Then we train both the clean and perturbed training examples simultaneously. Our method outperforms both BERT Large and RoBERTa Large baseline methods on three public datasets. Generally, deep learning models are regarded as black boxes, i.e., it is not clear what information in the input influences the models to make their decisions. European Union adopted new regulations to implement a ''right to explanation'' which means a user can ask for the explanation of a decision made by the algorithm [14]. Explainable AI focuses on explaining the decisions made by algorithms. In this paper, we leverage explainable AI capabilities to visualize the words that contribute to the model decision. The main contributions of this paper are: • We propose contrastive adversarial training as a regularizer for HMC and evaluate the performance of the proposed method on three public datasets.
2 https://pypi.org/project/emoji/ • We show that our method improves HMC performance over the existing methods on three public datasets.
• We provide the analysis of our best-performing model, i.e., RoBERTa decisions by leveraging the power of explainable AI. The rest of the paper is organized as follows: In section II, we discuss the related work, whereas in section III we present our method for HMC. In section IV, we give experimentation detail. In section V, we present the results and analysis of the experiments. In section VI, we provide the conclusion of the paper.

II. RELATED WORK
In this section, we discuss existing work in the literature related to adversarial training, contrastive learning, and health mention classification of tweets.

A. ADVERSARIAL TRAINING
Adversarial training (AT) has been studied in many supervised classification tasks such as object detection [15], [16], [17], object segmentation [17], [18] and image classification [10], [19], [20]. AT is the process of training the model to defend against malicious ''attacks'' and increase network robustness. AT involves the training of the model simultaneously with adversarial and clean examples. These malicious attacks are generated by perturbing the original input examples, so that the model predicts the wrong class label [21], [22] for them. FGSM proposed in [10] is the method for generating adversarial examples for images. [11] extends FGSM to NLP tasks such that it perturbs word embeddings instead of original text inputs and applies the method to both supervised and semi-supervised settings with Virtual Adversarial Training (VAT) [23] for the latter. Recent works propose to add perturbations to the attention mechanism of transformer-based methods [24], [25], [26]. Compared to single-step FGSM, [21] applies the multi-step approach to generate adversarial examples that proves more effective as compared to single-step FGSM, however it increases the computational cost due to the inner loop that iteratively calculates the perturbations. [27] proposes free adversarial training, where the inner loop calculates the perturbation as well as gradients with respect to the model parameters and updates the model parameters. [26] also uses the free AT algorithm and adds gradient accumulation to achieve a larger effective batch. It also applies perturbations to word embeddings of LSTM and BERT-based models similar to [11]. In our work, we generate adversarial examples using one-step FGSM and perform contrastive learning with clean examples to learn the representations for the input examples.

B. CONTRASTIVE LEARNING
Self-supervised contrastive learning methods, such as MoCo [28], SimCLR [12], and Barlow Twins [29] have narrowed down the performance gap between self-supervised learning and fully-supervised methods on the ImageNet [30] dataset. It has also been applied successfully in the NLP VOLUME 10, 2022 domain. The main idea of contrastive learning is to create positive pairs to train the models. Various methods have been used to create these pairs. [31] uses back-translation to generate another view of the input data. [32] uses the word and span deletion, reordering, and substitution of words, whereas [33] crops and masks sequences from an auxiliary Transformer to create positive pairs. [34] performs supervised contrastive learning [35] by treating training examples of the same class as positive pairs. To generate positive examples, [36] uses different dropout masks on the same data and treats premises and their corresponding hypotheses as positive pairs and contradictions as hard negatives in the NLI datasets [37], [38]. In our work, we train an original input and its adversarial example in parallel. We further use Barlow Twins [29] as an additional contrastive loss during fine-tuning of models to learn similar representations for the original and its adversarial example.

C. HEALTH MENTION CLASSIFICATION
[7] presents a new method namely Word Embedding Space Partitioning and Distortion (WESPAD) for health mention classification on Twitter data. WESPAD first learns to partition and then distort word representations, which acts as a regularizer and adds generalization capabilities to the model. This method also solves the problem of little training examples for the positive health mentions in the dataset. Although, this method improves the classification accuracy, distorting the original word embedding causes information loss. [6] uses non-contextual word embeddings for tweet health classification. It applies the preprocessing on the given tweet and extracts non-contextual word representations from it, and then passes these representations to Long Short-Term Memory Networks (LSTMs) [39]. LSTMs-based classifier outperforms Support Vector Machines (SVM), K-Nearest Neighbor (KNN), and Decision Trees. [8] uses a two-stepped approach for tweet classification. First, it detects whether the disease word is mentioned figuratively or not, and then, it uses this information as a new binary feature combined with other features and applies a convolutional neural network (CNN) for the classification. The usage of this additional feature improves the classification results. This method does not work well on figurative mention tweets, especially the disease word ''heart attack'', one of the most widely used words in the figurative sense. [5] adds 14k new tweets to the existing health-mention dataset ''PHM2017'' [7]. It also uses emojis by converting them into string representations using the Python library. 3 As a preprocessing, it normalizes the URL and user mentions in the tweet. This work experiments with both non-contextual representations such as word2vec [40] as well as with contextual representations like ELMO [41] and BERT [2] and incorporates sentiment information using WordNet [42], VAD [43], and ULMFit [44]. It combines the output of the Bi-LSTM [45] with sentiment information to produce a final binary output that represents classification 3 https://pypi.org/project/emoji/ results. Experiments show that combining BERT and VAD outperforms other methods. [9] uses permutation-based word representation method [46] for health mention classification and leverages the emojis as a part of the tweet text by converting them into a text representation. [47] presents a new dataset of Reddit posts called the Reddit health mention dataset (RHMD) and classifies a given post as health mention or not by combining the symptom or disease terms with user behavior. [48] presents a COVID-19 personal health mention (PHM) dataset containing labeled tweets and proposes a dual CNN for the detection of health mention tweets. The dual CNN consists of a primary network called P-Net, and an auxiliary network called A-Net where A-Net helps P-Net to alleviate the class-imbalance issue.
In this paper, we exploit the adversarial training combined with contrastive learning on the task of HMC. For this purpose, we generate adversarial examples using FGSM and employ Barlow Twins [29] as a contrastive loss. We evaluate our method on 3 public datasets.

III. METHOD
In this section, we describe the basics of the transformerbased encoder for text classification. Then we discuss adversarial training and contrastive loss. Finally, we discuss how to combine these ideas to improve the HMC score. Figure 1 shows the overall architecture of the model.
where 'C' denotes the number of classes in the dataset, and 'N' is the number of training examples in a batch.

B. ADVERSARIAL TRAINING
AT involves perturbing the inputs to the model that cause misclassifications. FGSM is proposed by [10] to generate perturbed examples. The model is trained on both clean and adversarial examples in parallel which improves the model's robustness against adversarial attacks. Let, 'r' be the small perturbation to the input example x i , and y i be the ground truth. Then we maximize the loss function: where L(f θ (x i + r), y i ) is the loss function and f θ is the neural network parameterized by θ.
To produce the perturbation 'r', Equation (2) can be simplified as follows: To generate adversarial examples, similar to [11] we perturb the embedding matrix E ∈ R d v ×d h where d h is hidden unit size and d v is vocabulary size in the transformer model 'M'. At the end of each forward pass, we calculate the gradient of the loss function given in equation (1), with respect to embedding matrix 'E', instead of input examples as given in equation (3) to calculate the amount of perturbation. We add this perturbation to the embedding matrix and the network goes through another forward pass using the adversarial example. Finally, we calculate another classification loss against the adversarial example.

C. CONTRASTIVE LEARNING
Given a pair of clean and perturbed examples, we want to learn their representation similar to each other while learning different representations for the examples that are not from the same pair. To learn this representation, we leverage contrastive learning as a part of fine-tuning process. To this end, we employ the Barlow Twins loss proposed by Zbontar et al. [29] that is based on the redundancy reduction principle. The equation for the Barlow Twins is given as follows: where L ctr is a Barlow Twins, i=1 (1 − M ii ) 2 , and i=1 j =i M 2 ij represent invariance, and redundancy reduction terms respectively, and β is the trade-off parameter between two terms. M is a square matrix and computes the cross-correlation between clean example embeddings (E clean ), and the adversarial example embeddings (E perturbed ). Values of M vary between −1 (representing a perfect anti-correlation), and +1 (representing a perfect correlation). M ij is computed as follows: Similar to [13], we take the weighted average of two classification losses (for clean and its adversarial example) and the contrastive loss (represented by L ctr ) as given below: where λ controls the weightage of losses, and L represents the total loss.

IV. EXPERIMENTS
In this section, first, we discuss the used datasets for training and evaluating our method. Then we give the pre-processing and training details for the method. VOLUME 10, 2022

A. DATASETS
We use three datasets to train and evaluate contrastive adversarial training. These datasets can be accessed at https: //github.com/pervaizniazi/HMCDatasets. The detail of each dataset is given as follows:

1) PHM2017
This dataset is an extended version of the PHM2017 dataset provided by [5]. We split the dataset into 65%, 15%, and 20% for the train, validation, and test sets, respectively. This dataset contains data related to 10 diseases, namely, Alzheimer's, cancer, cough, depression, fever, headache, heart attack, migraine, Parkinson's, and stroke. There were 15,742 tweets at the download time, out of which 4,228 tweets were health mentions (HM), whereas 7,322 and 4,192 tweets were non-health mentions (NHM) and figurative mentions (FM), respectively.

2) COVID-19 PHM
This dataset contains tweets related to COVID-19 for HMC task where every tweet example is labeled as one of the four categories., i.e., self-mention, other-mention, awareness, and non-health. There were 9,219 tweet examples available at the time of download. We use the proportion of 8:1:1 for train, validation, and test set split following [48]. Similar to [48] we combine self-mention, other-mention, and non-health categories to tackle the class imbalance issue.

3) RHMD
RHMD dataset contains 10,015 posts from Reddit platform [47]. Every post contains one of the 15 disease or symptom terms such as migraine, asthma, siabetes, PTSD, depression, cough, addiction, Alzheimer, OCD, headache, fever, allergy, cancer, stroke, and heart attack. Every tweet example has a label of one of the four categories, i.e., personal healthmention (PHM), non-personal health mention (NPHM), figurative mention (FM), and hyperbolic mention. The public version of dataset combines figurative and hyperbolic health mention classes.

B. PREPROCESSING
Each tweet goes through the preprocessing pipeline before going through the model. We first convert emojis in the tweet to text using Python library 2 . Then we remove all the user mentions, URLs, hashtags, and special characters. This preprocessing makes the emojis a part of the tweet text.

C. TRAINING DETAILS
We conduct experiments by using BERT Large and RoBERTa Large as baseline models. Then we apply contrastive adversarial training using these models. For all the experiments, we set a fixed learning rate of 1e −5 and fine-tune models for 10 epochs. For BERT Large

V. RESULTS AND ANALYSIS
We fine-tune two transformer models namely BERT Large and RoBERTa Large and use these models as the baseline for the task of HMC. For contrastive adversarial training, we use these models with three losses, i.e., two classification losses (for cleaned and adversarial examples) and a contrastive loss, and take the weighted average of these losses. Table 1 shows the test set results on three datasets for baseline and contrastive adversarial training (denoted as AT + Ctr). On the PHM2017 dataset, BERT + AT + Ctr improves the performance over the baseline by 1.23% and 1.5% in terms of macro F1-score and micro F1-score, respectively. RoBERTa + AT + Ctr improves macro and micro F1-scores of 0.30% and 1.0% respectively, on the PHM2017 dataset. On the RHMD dataset, both macro and micro F1-scores improve by 1.0% and 1.33% respectively over the baseline training method for RoBERTa + AT + Ctr. However, BERT + AT + Ctr degrades the performance over the baseline in terms of both macro and micro F1 scores on the RHMD dataset. On the COVID-19 PHM dataset, BERT + AT + Ctr and RoBERTa + AT + Ctr improve macro F1-score by 0.62% and 4.14% respectively, over their baseline methods. Micro F1-scores improve by 0.5% and 4.5% by BERT + AT + Ctr and RoBERTa + AT + Ctr, respectively over their baseline methods on the COVID-19 PHM dataset.
In Figure 2, we plot the embedding on the validation set of all three datasets for the baseline and contrastive adversarial training of our best performing model, i.e., RoBERTa. We reduce the learned embeddings to lower dimensions using principal component analysis (PCA). The embedding plots show that different embeddings are learned for the baseline and contrastive adversarial training. In Figure 3, we plot the receiver operating characteristic (ROC) curve for the test sets of all three datasets for the baseline and adversarial training. Figure 3a and 3b visualize ROC curves on PHM2017 dataset for BERT and RoBERTa respectively. As shown in Figure 3a, the area of the ROC curve (AUC) for BERT + AT + Ctr is higher than the BERT baseline. However, the AUC for RoBERTa + AT + Ctr is slightly lower than the baseline method as shown in Figure 3b. For the PHM-COVID-19 dataset, the AUC for contrastive adversarial training for both BERT and RoBERTa models is higher than the baseline methods as shown in Figure 3c, and Figure 3d, respectively. As our task on the RHMD dataset is multi-class classification, we plot one vs all ROC curves for it. As shown in Figure 3e, the AUC of the BERT baseline is higher than its contrastive  adversarial training. The AUC of RoBERTa + AT + Ctr is higher for FM vs rest as compared to the baseline model as shown in Figure 3f. However, for other classes, AUC for baseline methods is higher than the contrastive adversarial training.

A. COMPARISON OF OUR METHOD WITH OTHERS WORK
In Table 2, we compare the performance of our method with L. Lu et al. [48] on the COVID-19 PHM dataset for binary classification setting. Our method performs better than L. Lu et al. [48] method in terms of F1-score. However, this is not a fair comparison due to sample mismatch in both experiments. In Table 3, we compare our results with Naseem et al. [47] on the RHMD dataset. Our contrastive adversarial training method for both BERT and RoBERTa improves precision, recall, and F1-score over Naseem et al. [47] method. RoBERTa with contrastive adversarial training improves precision, recall, and F1-scores by 2.43%, 2.27%, and 2.23% respectively over the Naseem et al. [47] method. We present the comparison of our method with some of the work in the literature on the PHM2017 dataset in Table 4. Our method improves precision, recall, and F-score as compared to the work in literature. RoBERTa + AT + Ctr achieves the precision, recall, and F1-score of 94.25%, 94.35%, and 94.3% respectively. In Table 5, we present the results that to see whether the adversarial training or contrastive loss improves the model's performance. Results show that adversarial training improves the F1-score over the baseline method in two of the three datasets. Adding the contrastive training further improves the performance in terms of F1-score in comparison to the adversarial training only on all three datasets.

B. VISUALIZING THE INFLUENTIAL WORDS
Deep learning models are black boxes in nature, i.e., it is unclear which features of the input influence the deep learning model to reach a decision. Hence, the use of deep learning in critical applications such as healthcare is questionable. European Union announced new regulations to implement a ''right to explanation'' which means a user can ask for the factors contributing to the decision of the deep learning model. Explainable AI [49] focuses on providing the internals of the model in a human-understandable way to explain the factors influencing the model decision. Especially, various methods explain the model decision by feature, neuron, and layer importance, also known as layer attribution algorithms [50]. In this paper, we visualize the important words that influence the model in reaching the classification decision using transformers-interpret library [51] based on Integrated Gradients algorithm [52]. In the Integrated Gradients algorithm, initially, there is no input word to the model. Then, words are gradually added and their impact on the predictions is observed. In this way, the influence of words from the input on prediction is calculated. In Table 6, we plot some randomly selected examples from the test sets of three datasets and analyze the importance of words in the classification decision of the best performing model, i.e., RoBERTa + AT + Ctr. The first tweet example from the PHM2017 dataset, ''just finished rolling my post depression joint so that I can smoke after my therapist session tomorrow'' is HM and classified by RoBERTa + AT + Ctr as HM. The words like ''rolling, post, depression, after, and session'' influence the model for classifying this tweet as HM. The words ''join and so'' contribute towards NHM classification. The model RoBERTa baseline wrongly classifies this tweet as NHM. ''just, so, and smoke'' are resulting in the model's prediction of NHM, whereas words ''finished, depression, join, and therapist'' are opposing the model prediction as NHM. The tweet ''I just straightened my hair out of depression wow look at me'' is classified correctly as NHM by RoBERTa + AT + Ctr. The words ''I, just, hair, depression, and wow'' influence the model to predict the tweet as NHM, whereas words such as ''straightened, of, look, at, me'' influence it to predict as HM. On the other hand, RoBERTa baseline wrongly predicts it as HM and the words such as ''wow, look, straightened'' oppose this decision. Similarly, we plot examples from other datasets as well.
Experimental results show that our method of contrastive adversarial training performs better than the baselines and other methods in the literature. Our method acts as a regularization technique that improves the generalization of the model. However, the amount of perturbation and weightage of the contrastive loss should be chosen carefully as perturbation distorts the embedding matrix, and overuse of perturbation may hurt the performance of the model.

VI. CONCLUSION
In this paper, we utilized contrastive adversarial training for the health mention classification task as a regularizer. We experimented with two transformers models, i.e., BERT Large and RoBERTa Large as baselines, and incorporated contrastive adversarial training mechanisms in these models as well. We evaluated the performance of these methods on the three public datasets. Results show that contrastive adversarial training as a regularization technique significantly improves the HMC performance over the baseline methods. We visualized some of the examples from the test set that were correctly classified by the best-performing model of contrastive adversarial training and misclassified by its baseline version to understand the classification decisions made by these models. IMRAN RAZZAK (Senior Member, IEEE) is currently a Senior Lecturer at the School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia. He has published more than 150 papers in reputed journals and conferences. He is the author of one book and inventor of one patent on face recognition. He has attracted research grant of 1.2 million AUD and has successfully delivered several research projects. His research interest includes machine learning with its application spans a broad range of topics. He has applied machine learning methods with emphasis to natural language processing and image analysis to solve real world problems related to health, finance, and social media.
ANDREAS DENGEL received the Diploma degree in CS from TUK and the Ph.D. degree from the University of Stuttgart. He is currently a Scientific Director of DFKI GmbH, Kaiserslautern. In 1993, he became a Professor in computer science at TUK, where he holds the Chair Knowledge-Based Systems. Since 2009, he has been appointed as a Professor (Kyakuin) with the Department of Computer Science and Information Systems, Osaka Prefecture University. He also worked at IBM, Siemens, and Xerox Parc. He is a member of several international advisory boards, has chaired major international conferences, and founded several successful start-up companies. He is the co-editor of international computer science journals and has written or edited 12 books. He is the author of more than 300 peer-reviewed scientific publications and supervised more than 170 master's and Ph.D. theses. He is a fellow of IAPR and received many prominent international awards. His main scientific emphasis is in the areas of pattern recognition, document understanding, information retrieval, multimedia mining, semantic technologies, and social media.
SHERAZ AHMED received the M.S. and Ph.D. degrees in computer science from TUK, Germany, under the supervision of Prof. H. C. Andreas Dengel and Dr. Habil. Marcus Liwicki. His Ph.D. topic is generic methods for information segmentation in document images. Over the last few years, he has primarily worked on development of various systems for information segmentation in document images. He is currently a Senior Researcher at DFKI GmbH, Kaiserslautern, where he is leading the area of time series analysis and life science. His research interests include document understanding, generic segmentation framework for documents, pattern recognition, anomaly detection, gene analysis, medical image analysis, and natural language processing. He has more than 80 publications on the said and related topics, including three journal articles and two book chapters. He is a Frequent Reviewer of various journals and conferences, including Pattern Recognition Letters, Neural Computing and Applications, IJDAR, ICDAR, ICFHR, and DAS.