Evaluating Annotation Consistency in Offensive Language Detection: A Data Analytics Approach on the TweetEval Dataset
Abstract
Most machine learning models are not only highly
dependent on difficult datasets but also on the quality of labeled
data they are trained on, especially for offensive content detection.
In this paper, we study the TweetEval dataset to provide a
comparison of its ground truth with manually annotated labels;
inter-annotator agreements are applied here as a metric for
assessing the consistency of annotation. Cohen’s Kappa coefficient
is used to quantify how much each pair of annotators agreed and
where they differed. In-depth examination of missed classifications
demonstrates other difficulties with manual labelling: subjective
interpretation, context dependency, and annotator bias. The in-
sights gathered demonstrate how manual annotation can have
positive and negative effects on further model training practices,
highlighting the importance of standardized annotation guidelines.
In their actions, the findings contribute to enhancing offensive
content detection models by advocating dataset reliability and the
reduction of inconsistencies in labeling.
Keywords:
—TweetEval Dataset, Annotation Consistency, Inter- Annotator Agreement,Cohen’s Kappa,, Offensive Language Detection, Hybrid Models,Annotator BiasPublished
Issue
Section
License
Copyright (c) 2025 International Journal on Emerging Research Areas

This work is licensed under a Creative Commons Attribution 4.0 International License.
All published work in this journal is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
How to Cite
Similar Articles
- Lida K Kuriakose, Misha Rose Joseph, R Namitha, Sheezan Niby, Tanver Ahmad Lone, Lip Reading and Reconstruction using ML , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Yamini C.K, Ajin krishna K U, Akhil Thilak, Amith Raj P R, Aromal A S, Alex joy, Jishnu Babu T, Jeswin jaison, VIDEO MOMENT RETRIEVAL SYSTEM , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- M Midhun, Sangeetha Tony, Tibin Abraham, B Vyshnav, ACCIDENT DETECTION USING VIDEO SURVEILLANCE , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Albin Thomas Lalu, Resmara S, Alen A Thankachen, Sneha Priya Sebastian, Dany Jennez , Lirin Blesson, Kesia Sunny, Fault Detection of Transmission Lines Using Unmanned Aerial Vehicle (UAV) , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Anishamol Abraham, Niya Joseph, State-of-the-Art Techniques for Image Forgery Detection: A Review , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Elsa George , Alphonsa Francis, Anna Job, Ann Maria James, Shiney Thomas, YOLOv8-Driven Approach for Wildlife Detection and Recognition , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Prinu Vinod Nair, Rohit Subash Nair, Samuel Thomas Mathew S, Ansamol Varghese, Weed detection using YOLOv3 and elimination using organic weedicides with Live feed on Web App , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Amala Jayan, Feneesha V B, Rameesa Dilsa C P, Sandra Maryam Binu, Sandra Maryam Binu, Stockwise: A survey on stock price prediction models , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Abhijith J, Athul Krishna S, Amarthyag P, Angela Rose Baby, Mekha Jose, CATARACT DETECTION USING DIGITAL CAMERA IMAGES , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Leo Jose, Navin Shibu George, Raju, Safa Haroon, Bini M Issac, Wearable Technology for Driver Monitoring and Health Management: A Comprehensive Survey , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
You may also start an advanced similarity search for this article.
