A Machine Learning Framework for Tumour Classification Using Transcriptomic and Multi-Omics Datasets

Rhea Maria  James; Richy Sara George; Sayooj Kumar M; Nihal Muhammed Ayoob; Shan Krishna; Tintu Alphonsa Thomas

A Machine Learning Framework for Tumour Classification Using Transcriptomic and Multi-Omics Datasets

Authors

Rhea Maria James

Author
Richy Sara George

Author
Sayooj Kumar M

Author
Nihal Muhammed Ayoob

Author
Shan Krishna

Author
Tintu Alphonsa Thomas

Author

Abstract

Cancer is a biologically heterogeneous disease characterized by molecular alterations across multiple regulatory
layers, necessitating robust computational modelling for accurate diagnosis and biomarker discovery. The increasing availability of
high-dimensional genomic and multi-omics datasets from largescale initiatives such as The Cancer Genome Atlas (TCGA)
has enabled the development of machine learning approaches for cancer classification. However, challenges including extreme
dimensionality, feature redundancy, and class imbalance continue affect model stability and generalization performance.
In this study, we propose a reproducible integrative machine learning framework for tumor versus normal classification and
biomarker identification using gene expression and multi-omics TCGA data. The methodology employs Extreme Gradient Boosting
(XGBoost) for embedded feature selection to identify then most informative molecular variables from tens of thousandsof features. The selected features are subsequently used to train ensemble classifiers including Logistic Regression, Random
Forest, and Support Vector Machine models. To ensure unbiased performance estimation and prevent data leakage, a stratified five-fold cross-validation strategy is adopted. Experimental evaluation on breast and lung cancer datasets demonstrates strong discriminative performance, with the XGBoost–Random Forest model achieving mean classification accuracies exceeding 99%, along with high ROC-AUC and Cohen’s Kappa values. Furthermore, multi-omics integration improves classification robustness by capturing complementary molecular signals across biological layers. The results indicate that XGBoost-driven feature selection combined with ensemble learning provides a scalable, interpretable, and effective framework for high-dimensional cancer classification and biomarker discovery.

Keywords:

Cancer classification, Multiomics integration, XGBoost, Biomarker discovery, Machine learning

Downloads 45

Full Text (PDF)

Published

29-05-2026

Issue

Vol. 6 No. 1 (2026): IJERA

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All published work in this journal is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

How to Cite

[1]

R. M. James, R. S. George, S. K. M, N. M. Ayoob, S. Krishna, and T. A. Thomas, “A Machine Learning Framework for Tumour Classification Using Transcriptomic and Multi-Omics Datasets”, IJERA, vol. 6, no. 1, pp. 271–279, May 2026, Accessed: Jul. 27, 2026. [Online]. Available: https://ijera.in/index.php/IJERA/article/view/353

Download Citation

Indexed By

A Machine Learning Framework for Tumour Classification Using Transcriptomic and Multi-Omics Datasets

Authors

Rhea Maria James

Richy Sara George

Sayooj Kumar M

Nihal Muhammed Ayoob

Shan Krishna

Tintu Alphonsa Thomas

Abstract

Keywords:

Published

Issue

Section

License

How to Cite

Similar Articles

Similar Articles

Malware Classification using Image Analysis

Survey of Machine Learning and Deep Learning Approaches for Automated Hate Speech Detection and Sentiment Analysis in Multilingual Contexts

Advancements in ECG Heartbeat Classification: A Comprehensive Review of Deep Learning Approaches and Imbalanced Data Solutions

Deep Learning and Machine Learning Approaches for Satellite-Based Environmental Monitoring: A Comprehensive Survey

A Review Based On Deep Learning Techniques Of Ovarian Cancer Detection

Lung Disease Detection From Chest X-ray Images Using Hybrid Machine Learning Model

Advanced Sensor-Based Landslide Detection and Alert System Utilizing Machine Learning

Multiple Detection and Diagnosis of Skin Diseases using CNN

Traffic Violation Detection Using Machine Learning: A Comprehensive Study

Machine Learning and Medical Authority Engagement for Antimicrobial Resistance Management: A Review of Surveillance, Prediction, and Stewardship