A Machine Learning Framework for Tumour Classification Using Transcriptomic and Multi-Omics Datasets
Abstract
Cancer is a biologically heterogeneous disease characterized by molecular alterations across multiple regulatory
layers, necessitating robust computational modelling for accurate diagnosis and biomarker discovery. The increasing availability of
high-dimensional genomic and multi-omics datasets from largescale initiatives such as The Cancer Genome Atlas (TCGA)
has enabled the development of machine learning approaches for cancer classification. However, challenges including extreme
dimensionality, feature redundancy, and class imbalance continue affect model stability and generalization performance.
In this study, we propose a reproducible integrative machine learning framework for tumor versus normal classification and
biomarker identification using gene expression and multi-omics TCGA data. The methodology employs Extreme Gradient Boosting
(XGBoost) for embedded feature selection to identify then most informative molecular variables from tens of thousandsof features. The selected features are subsequently used to train ensemble classifiers including Logistic Regression, Random
Forest, and Support Vector Machine models. To ensure unbiased performance estimation and prevent data leakage, a stratified five-fold cross-validation strategy is adopted. Experimental evaluation on breast and lung cancer datasets demonstrates strong discriminative performance, with the XGBoost–Random Forest model achieving mean classification accuracies exceeding 99%, along with high ROC-AUC and Cohen’s Kappa values. Furthermore, multi-omics integration improves classification robustness by capturing complementary molecular signals across biological layers. The results indicate that XGBoost-driven feature selection combined with ensemble learning provides a scalable, interpretable, and effective framework for high-dimensional cancer classification and biomarker discovery.
Keywords:
Cancer classification, Multiomics integration, XGBoost, Biomarker discovery, Machine learningPublished
Issue
Section
License
Copyright (c) 2026 International Journal on Emerging Research Areas

This work is licensed under a Creative Commons Attribution 4.0 International License.
All published work in this journal is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
How to Cite
Similar Articles
- Selin Sam, Ameen Shouketh, Eby Jo, Jithin Russel, Joyal Anto, Muhammed Nihal K, Animal Detection Using Footprint , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Arya Raj S, R Gopika Krishnan, Drishya Das, Rohith R, Jocelyn Ann Joseph, Personality Profiling Using CV Analysis , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Parvathy V A, Irfana Parveen C A, Alisha K A, Reshma P R, Manu Krishna C P, Detection of Diabetic Retinopathy and Glaucoma using Deep Learning , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Don Joseph, Fiyona Ann Sojan, Jimmy Mathew, Jobin Jomy Mathew, Bibin Varghese, A Review on Image and Video Processing with IoT-Enabled Supervised Learning for Intelligent Surveillance Systems , International Journal on Emerging Research Areas: Vol. 6 No. 1 (2026): IJERA
- Fabeela Ali Rawther, Akhil P Dominic, Alan James, Christy Chacko, Elena Maria Varghese, Early Detection of Attention Deficiency Using ML , International Journal on Emerging Research Areas: Vol. 3 No. 1 (2023): IJERA
- Merin Wilson, Muhammed Sajid N, Nandana L P, Nanda Santhosh, Rahul M, Mekha Jose, A Review on Deep Learning and IoT-Based Road Surface Damage Detection , International Journal on Emerging Research Areas: Vol. 6 No. 1 (2026): IJERA
- Amala Jayan, Feneesha V B, Rameesa Dilsa C P, Sandra Maryam Binu, Sandra Maryam Binu, Stockwise: A survey on stock price prediction models , International Journal on Emerging Research Areas: Vol. 4 No. 1 (2024): IJERA
- Betzy Babu Thoppil, Midhun P Mathew, Sania Elsa Reji, Nazreen Shanavaaz, Unnimaya v Ashok, Nila S S Nila, Comparative Study of Deep Learning Models for Pneumonia Classification , International Journal on Emerging Research Areas: Vol. 6 No. 1 (2026): IJERA
- Ansamol Varghese, Anandhu Anoj, Angel Thomas, Deepta K Sunny, Emil Thomas, TrueNews-AI Powered Detection of Manipulated Text and Images , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
- Mishal Rose Thankachan, Joshua John Sajit, Merwin Maria Antony, Richa Maria Biju, Richa Maria Biju, Bini M Issac, Pixelyse : ViT- VAE for Document Forgery Detection , International Journal on Emerging Research Areas: Vol. 5 No. 1 (2025): IJERA
You may also start an advanced similarity search for this article.
