Integrating Multi-Omics and Machine Learning for Subtype-Specific Risk Stratification in Breast Cancer: A Step Toward Personalized Preventive Medicine

Swati Rai

Previous Article in event

A novel molecular mechanism regulates gastric cancer cell homeostasis

Previous Article in session

Personalized Preventive Strategies for Non-Communicable Diseases in Primary Care

Next Article in event

INTEGRATING PATIENT-DERIVED ORGANOIDS AND NATURAL COMPOUNDS FOR DIAGNOSTIC AND THERAPEUTIC INNOVATION IN IBD

Next Article in session

Modulation of microbiota metabolism for the prevention of postoperative complications in patients undergoing cardiac surgery: results of a pilot randomized controlled trial

Integrating Multi-Omics and Machine Learning for Subtype-Specific Risk Stratification in Breast Cancer: A Step Toward Personalized Preventive Medicine

Swati Rai

¹ Amity Institute of Biotechnology, Amity University Noida, Sector -125, Noida - 201303 (U.P.), India

Academic Editor: Kenneth Pritzker

Published: 23 October 2025 by MDPI in The 1st International Online Conference on Personalized Medicine session Personalized Preventive Medicine

Abstract:

Introduction:
Advancements in transcriptomic profiling and machine learning are transforming preventive oncology. Triple-Negative Breast Cancer (TNBC), a clinically aggressive and heterogeneous subtype, lacks targeted therapies and presents challenges for early interception. This study investigates the potential of unsupervised learning to reveal molecular subtypes within TNBC, supporting individualized risk assessment strategies.

Methods:
Transcriptomic and clinical data from the TCGA breast cancer cohort were analyzed. TNBC cases were identified based on basal-like classification. Dimensionality reduction was performed using Principal Component Analysis (PCA), followed by k-means clustering to detect latent subgroups. Genes with the highest variance were used to explore inter-sample expression patterns.

Results:
The PCA of TNBC transcriptomic data revealed distinct variance among patient samples, indicating underlying molecular heterogeneity. The first two principal components captured a substantial portion of the variance, allowing for clear visual separation. K-means clustering (k=3) identified three reproducible molecular subgroups with minimal overlap and strong intra-cluster similarity. Analysis of the top 50 most variably expressed genes showed distinct expression patterns across clusters, suggesting differential pathway activation. Several of these genes are linked to cancer progression, immune response, and therapy resistance, highlighting their potential as early biomarkers. These findings demonstrate that unsupervised transcriptomic clustering can uncover clinically relevant TNBC subtypes, supporting subtype-specific risk stratification and preventive strategies.

Conclusions:
This study presents a reproducible framework for uncovering transcriptional heterogeneity in TNBC using unsupervised machine learning. By enabling early subgroup identification, this approach supports the goals of personalized preventive medicine through data-driven stratification and future biomarker discovery.

Keywords: Triple-Negative Breast Cancer; Transcriptomics; Machine Learning; PCA; Clustering; Molecular Subtypes; Biomarkers; Personalized Preventive Medicine

11 Reads
0 Recommendations

Swati Rai