Journals & Magazines >IEEE Access >Volume: 11

Annotators’ Selection Impact on the Creation of a Sentiment Corpus for the Cryptocurrency Financial Domain

0 seconds of 0 secondsVolume 90%

00:00

Inter-rater reliability coefficients (Fleiss's Kappa, Krippendorff's Alpha, and Gwet's AC1) for sentiment analysis data set labeling by UTAD and IE annotators. Each bar r...

Abstract:

Well labeled natural language corpus data is essential for most natural language processing techniques, especially in specialized fields. However, cohort biases remain a ...Show More

Metadata

Abstract:

Well labeled natural language corpus data is essential for most natural language processing techniques, especially in specialized fields. However, cohort biases remain a significant challenge in machine learning. The narrow origin of data sampling or human annotators in cohorts is a prevalent issue for machine learning researchers due to its potential to induce bias in the final product. During the development of the CryptoLin corpus for another research project, the authors became concerned about the potential influence of cohort bias on the selection of annotators. Therefore, this paper addresses the question of whether cohort diversity improves the labeling result through the implementation of a repeated annotator process, involving two annotator cohorts and a statistically robust comparison methodology. The utilization of statistical tests, such as the Chi-Square Independence test for absolute frequency tables, and the construction of confidence intervals for Kappa point estimates, facilitates a rigorous analysis of the differences between Kappa estimates. Furthermore, the application of a two-proportion z-test to compare the accuracy scores of UTAD and IE annotators for various pre-trained models, including Vader Sentiment Analysis, TextBlob Sentiment Analysis, Flair NLP library, and FinBERT Financial Sentiment Analysis with BERT, contributes to the advancement of knowledge in this field. The paper utilizes Cryptocurrency Linguo (CryptoLin), a corpus containing 2683 cryptocurrency-related news articles spanning more than three years, and compares two different selection criteria for the annotators. CryptoLin was annotated twice with discrete values representing negative, neutral, and positive news respectively. The first annotation was done by twenty-seven annotators from the same cohort. Each news title was randomly assigned and blindly annotated by three human annotators. The second annotation was carried out by eighty-three annotators from three cohorts. Eac...

0 seconds of 0 secondsVolume 90%

00:00

Inter-rater reliability coefficients (Fleiss's Kappa, Krippendorff's Alpha, and Gwet's AC1) for sentiment analysis data set labeling by UTAD and IE annotators. Each bar r...

Published in: IEEE Access ( Volume: 11)

Page(s): 131081 - 131088

Date of Publication: 17 November 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3334260

Contents

References is not available for this document.

Annotators’ Selection Impact on the Creation of a Sentiment Corpus for the Cryptocurrency Financial Domain

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Annotators’ Selection Impact on the Creation of a Sentiment Corpus for the Cryptocurrency Financial Domain

Alerts

Abstract:

Metadata

Abstract:

Authors

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?