Evaluating Annotation Consistency in Offensive Language Detection: A Data Analytics Approach on the TweetEval Dataset
Abstract
Most machine learning models are not only highly
dependent on difficult datasets but also on the quality of labeled
data they are trained on, especially for offensive content detection.
In this paper, we study the TweetEval dataset to provide a
comparison of its ground truth with manually annotated labels;
inter-annotator agreements are applied here as a metric for
assessing the consistency of annotation. Cohen’s Kappa coefficient
is used to quantify how much each pair of annotators agreed and
where they differed. In-depth examination of missed classifications
demonstrates other difficulties with manual labelling: subjective
interpretation, context dependency, and annotator bias. The in-
sights gathered demonstrate how manual annotation can have
positive and negative effects on further model training practices,
highlighting the importance of standardized annotation guidelines.
In their actions, the findings contribute to enhancing offensive
content detection models by advocating dataset reliability and the
reduction of inconsistencies in labeling.
Keywords:
—TweetEval Dataset, Annotation Consistency, Inter- Annotator Agreement,Cohen’s Kappa,, Offensive Language Detection, Hybrid Models,Annotator BiasPublished
Issue
Section
License
Copyright (c) 2025 International Journal on Emerging Research Areas

This work is licensed under a Creative Commons Attribution 4.0 International License.
All published work in this journal is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
How to Cite
Similar Articles
- S Sreejith, Akshara Santhosh, Ardra Haridas, S Jayakrishnan, Ojus Thomas Lee, Chitra Merin Varghese, BrailE- Reading Device for the Deaf and Blind in Real Time Speech , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Akshaya Babu, Amala Saju, Athulya C A, Mary Niya Sebastian, Nisy John Panicker, PlateGuard: Ensuring Security with YOLOv5 ANPR Technology , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Maria Sajeeve, Karthik Vinod, Kausalya Sumesh, Joby Jose, Minu Cherian, KALO:AI-Powered Precision in Nutrition Tracking , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Joyal Joby Joseph, Michael Abraham Philips, Noel J Abraham, Steffi Maria Saji, Shiney Thomas, A Review of Parkinson Disease Detection Techniques , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Ryan Leo , Mathews P Jose, Eirene Nikky , Lloyd Micheal, Chinnu Edwin A , Controlling a Mini Game using a Brain-Computer Interface , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Benjamin Francis Thottam, Angela Mary Anil, Annu Maria Thomas, Ann Maria, Mekha Jose, Review on Applications Utilizing Traditional Farming Practices , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Tom Kurian, Ektha P S, Chethana Raj T, Diona Joseph, Annu Mary Abraham, Intelligent Disease Prediction in Hydroponic Systems Using Machine Learning , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Layana S Pradeep, Milen Ninan Ittiyeipe, Shahina S, Soumya A S, Ojus Thomas Lee , Gayathri Mohan, A REVIEW OF LOAD ESTIMATION AND DISTRIBUTION STRATEGY FOR RENEWABLE ENERGY SOURCES , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Goutham P Raj, Gregan George, Hadii Hasan, John Ashwin Delmon, V Pradeeba, COMPREHENSIVE VEHICLE SERVICES & E-COMMERCE PLATFORM WITH PRICE PREDICTION USING ML , International Journal on Emerging Research Areas: Vol. 4 No. 2 (2024): IJERA
- Nikita Niteen , Simy Mary Kurian, Exploring Explainable AI, Security and Beyond : A Comprehensive Review , International Journal on Emerging Research Areas: Vol. 3 No. 2 (2023): IJERA
You may also start an advanced similarity search for this article.
