Evaluating Annotation Consistency in Offensive Language Detection: A Data Analytics Approach on the TweetEval Dataset
Abstract
Most machine learning models are not only highly
dependent on difficult datasets but also on the quality of labeled
data they are trained on, especially for offensive content detection.
In this paper, we study the TweetEval dataset to provide a
comparison of its ground truth with manually annotated labels;
inter-annotator agreements are applied here as a metric for
assessing the consistency of annotation. Cohen’s Kappa coefficient
is used to quantify how much each pair of annotators agreed and
where they differed. In-depth examination of missed classifications
demonstrates other difficulties with manual labelling: subjective
interpretation, context dependency, and annotator bias. The in-
sights gathered demonstrate how manual annotation can have
positive and negative effects on further model training practices,
highlighting the importance of standardized annotation guidelines.
In their actions, the findings contribute to enhancing offensive
content detection models by advocating dataset reliability and the
reduction of inconsistencies in labeling.
Keywords:
—TweetEval Dataset, Annotation Consistency, Inter- Annotator Agreement,Cohen’s Kappa,, Offensive Language Detection, Hybrid Models,Annotator BiasPublished
Issue
Section
License
Copyright (c) 2025 International Journal on Emerging Research Areas

This work is licensed under a Creative Commons Attribution 4.0 International License.
All published work in this journal is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
How to Cite
Similar Articles
- K.M Gishma, K.B Annmaria , V.N Ramna Parvan , Anagha Suresh, Athira Shaji, LIP READING AND PREDICTION SYSTEM BASED ON DEEP LEARNING , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Amal P Varghese , Juby Mathew, Advancements in Vehicular Communication Systems: Integrating IoT, Edge Cloud Computing, Microgrid Energy Management, Blockchain, AI, and Simulation Tools , International Journal on Emerging Research Areas: Vol. 3 No. 2 (2023): IJERA
- Lakshmi Nandana, Mariyam Emamudeen, Nikitha Mary Varghese, Susan Andrews, Manoj T Joy, FaceVue: A Review For Dynamic Advertising And Cost Management System , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Honey Joseph, A Survey and Analysis on Predicting Heart Disease Using Machine Learning Techniques , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Honey Joseph, Aaron M Vinod, Abin Mathew varghese, Aby Alex, Aleena Sain, Crop Yield Prediction Using ML , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Tintu Alphonsa Thomas, Nandana Rajagopal, Neethu Liz Shaji, Silby Elza Simon, P Sree Parvathy, Survey on Video Summarization using Extracted Audio , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- P S Aswin, Archana Madhusudhanan , Athulya Sajeev, Neeha Moideen , C R Suhail, Revolutionizing Football Management: A Data-Driven Approach with Random Forest Regressor , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Fabeela Ali Rawther, Akhil P Dominic, Alan James, Christy Chacko, Elena Maria Varghese, Early Detection of Attention Deficiency Using ML , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Peter Cyriac, Binu B. R., An Integrated Approach to Campus Water Management: Leveraging Wireless Automation and Advanced Virtual Leakage Auditing , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Aadithya Hari Nair, Adithi R Kumar, Aleena Thomas, Jeffy Shiju, Tom Kurian, Dynamic Traffic Light Control: A Novel Approach for Congestion Mitigation and Traffic Optimization , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
You may also start an advanced similarity search for this article.
