Brain cancer remains one of the most aggressive malignancies worldwide. Subtypes such as glioblastoma and ependymoma exhibit markedly different clinical behaviors and treatment responses, making accurate classification essential for diagnosis and therapy. This study analyzed the Brain_GSE50161 transcriptomic dataset, comprising 54,676 gene expression features across 130 samples representing four brain cancer subtypes and normal tissue. After preprocessing and standardization using StandardScaler, a multi-stage feature selection pipeline was applied, combining variance threshold filtering and Recursive Feature Elimination (RFE) to identify the most informative features. Machine learning models, including Logistic Regression and Random Forest classifiers, were developed and optimized using GridSearchCV. Model performance was evaluated using an 80/20 train–test split with 5-fold cross-validation and assessed via accuracy, precision, recall, and F1-score. Feature selection reduced the dataset to a refined 30-gene biomarker signature. The optimized Random Forest model achieved 92.31% classification accuracy, outperforming the Logistic Regression baseline (84.62%). Cross-validation confirmed model stability (89.23% average score). Volcano plot analysis revealed significantly up- and downregulated genes among tumor subtypes, while Principal Component Analysis (PCA) demonstrated distinct clustering, with the first two components explaining approximately 45% of variance. Several probes, including 200502_s_at, contributed strongly to diagnostic discrimination. This study demonstrates that integrating transcriptomic analysis with machine learning-driven feature selection provides a robust framework for brain cancer subtype classification and biomarker discovery. High-precision diagnostics can be achieved using targeted gene signatures rather than full-genome analysis. The identified biomarkers represent promising candidates for future clinical validation and the development of improved molecular diagnostics in precision oncology.
Previous Article in event
Previous Article in session
Next Article in event
Brain Cancer Gene Expression Analysis for Subtype Classification and Biomarker Discovery Using Machine Learning
Published:
05 June 2026
by MDPI
in The 5th International Electronic Conference on Cancers
session Causes, Diagnosis and Treatment of Cancer
Abstract:
Keywords: Biomarkers Discovery; Gene Expression; Brain Cancer; Machine Learning; Cancer Subtype Classification