Evaluating Annotation Consistency in Offensive Language Detection: A Data Analytics Approach on the TweetEval Dataset
Abstract
Most machine learning models are not only highly
dependent on difficult datasets but also on the quality of labeled
data they are trained on, especially for offensive content detection.
In this paper, we study the TweetEval dataset to provide a
comparison of its ground truth with manually annotated labels;
inter-annotator agreements are applied here as a metric for
assessing the consistency of annotation. Cohen’s Kappa coefficient
is used to quantify how much each pair of annotators agreed and
where they differed. In-depth examination of missed classifications
demonstrates other difficulties with manual labelling: subjective
interpretation, context dependency, and annotator bias. The in-
sights gathered demonstrate how manual annotation can have
positive and negative effects on further model training practices,
highlighting the importance of standardized annotation guidelines.
In their actions, the findings contribute to enhancing offensive
content detection models by advocating dataset reliability and the
reduction of inconsistencies in labeling.
Keywords:
—TweetEval Dataset, Annotation Consistency, Inter- Annotator Agreement,Cohen’s Kappa,, Offensive Language Detection, Hybrid Models,Annotator BiasPublished
Issue
Section
License
Copyright (c) 2025 International Journal on Emerging Research Areas

This work is licensed under a Creative Commons Attribution 4.0 International License.
All published work in this journal is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
How to Cite
Similar Articles
- Betzy Babu Thoppil, Anugrah Premachandran, Annapoorna M, Ashwin Mathew Zachariah, Bala Susan Jacob, Advanced Sensor-Based Landslide and Earthquake Detection and Alert System Utilizing Machine Learning and Computer Vision Technologies , International Journal on Emerging Research Areas: Vol. 4 No. 2 (2024): IJERA
- Amal P Varghese, Simy Mary Kurian, Advancements in ECG Heartbeat Classification: A Comprehensive Review of Deep Learning Approaches and Imbalanced Data Solutions , International Journal on Emerging Research Areas: Vol. 3 No. 2 (2023): IJERA
- Selin Sam, Ameen Shouketh, Eby Jo, Jithin Russel, Joyal Anto, Muhammed Nihal K, Animal Detection Using Footprint , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Parvathy V A, Irfana Parveen C A, Alisha K A, Reshma P R, Manu Krishna C P, Detection of Diabetic Retinopathy and Glaucoma using Deep Learning , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Muneebah Mohyiddeen, Sana T.H, Anoodh Hussain, Nandana P Narayanan, Sneha Soman, DGCURE: Model for Detection of Dysgraphia , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Dr. Indu John, Gauri Santhosh, Jesna Susan Reji, Abdul Musawir, Glady Prince, Detection of Autism Spectrum Disorder in Toddlers using Machine Learning , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Jyothis Joseph, Angeetha Raju, Aparna Santhosh, Ashitha Jenish, K S Minu, Survey on Fake Profile Detection in Social Media , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Dipjyoti Deka, Rituparna Seal, Shubham Banik, Unmasking Fraudulent Job Ads: A Critical Review of Machine Learning Techniques for Detecting Fake Jobs , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Neil Sen Easow, Rajalakshmi Shankar , Nandhu Babu, Rudra Pratap Singh, Juby Mathew, Career Finder: AI powered career guider , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Nehala Noushad, Nikhitha Thomas, Reema Maria Suresh, Rehan T Raj , Resmipriya M G, AI-Based Analysis of Road Congestion Causes Using Real-Time Traffic Camera Data , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
You may also start an advanced similarity search for this article.
