HealthLies: Dataset and Machine Learning Models for Detecting Fake Health News | IEEE Conference Publication | IEEE Xplore

HealthLies: Dataset and Machine Learning Models for Detecting Fake Health News


Abstract:

Current datasets and models focusing on health fake news identification are few and far between and primarily based on COVID-19. In this paper, we introduce a new health ...Show More

Abstract:

Current datasets and models focusing on health fake news identification are few and far between and primarily based on COVID-19. In this paper, we introduce a new health news-specific dataset called HealthLies, which includes 11,001 facts and myths about diseases such as COVID-19, Cancer, Polio, Zika, HIV/AIDS, SARS, and Ebola collected from a wide range of sources. We train several machine learning models, including KNN, SVM, Logistic Regression, Naive Bayes, an MLP Classifier, and a deep learning model based on the state-of-the-art Natural Language Processing (NLP) BERT model, which we name BERT-HealthLies. We find that BERT-HealthLies typically achieves the highest accuracy across models, though other models may be preferable in some real-time applications due to their orders of magnitude faster prediction and training times. In addition, ensembling BERT-HealthLies with the other models performs up to 12% better than BERT-HealthLies alone when identifying fake news related to a new disease for which we do not yet have training data.
Date of Conference: 15-18 August 2022
Date Added to IEEE Xplore: 27 September 2022
ISBN Information:
Conference Location: Newark, CA, USA

I. Introduction

The spread and negative impact of fake news has grown significantly over the past few years, with dangerous consequences for both the democratic process [1] and public health. For example, the WHO listed vaccine hesitancy as one of the top threats to global health in 2019 [2].

Contact IEEE to Subscribe

References

References is not available for this document.