Please login first
Integrating Multi-Omics and Machine Learning for Subtype-Specific Risk Stratification in Breast Cancer: A Step Toward Personalized Preventive Medicine
1  Amity Institute of Biotechnology, Amity University Noida, Sector -125, Noida - 201303 (U.P.), India
Academic Editor: Kenneth Pritzker

Abstract:

Introduction:
Advancements in transcriptomic profiling and machine learning are transforming preventive oncology. Triple-Negative Breast Cancer (TNBC), a clinically aggressive and heterogeneous subtype, lacks targeted therapies and presents challenges for early interception. This study investigates the potential of unsupervised learning to reveal molecular subtypes within TNBC, supporting individualized risk assessment strategies.

Methods:
Transcriptomic and clinical data from the TCGA breast cancer cohort were analyzed. TNBC cases were identified based on basal-like classification. Dimensionality reduction was performed using Principal Component Analysis (PCA), followed by k-means clustering to detect latent subgroups. Genes with the highest variance were used to explore inter-sample expression patterns.

Results:
The PCA of TNBC transcriptomic data revealed distinct variance among patient samples, indicating underlying molecular heterogeneity. The first two principal components captured a substantial portion of the variance, allowing for clear visual separation. K-means clustering (k=3) identified three reproducible molecular subgroups with minimal overlap and strong intra-cluster similarity. Analysis of the top 50 most variably expressed genes showed distinct expression patterns across clusters, suggesting differential pathway activation. Several of these genes are linked to cancer progression, immune response, and therapy resistance, highlighting their potential as early biomarkers. These findings demonstrate that unsupervised transcriptomic clustering can uncover clinically relevant TNBC subtypes, supporting subtype-specific risk stratification and preventive strategies.

Conclusions:
This study presents a reproducible framework for uncovering transcriptional heterogeneity in TNBC using unsupervised machine learning. By enabling early subgroup identification, this approach supports the goals of personalized preventive medicine through data-driven stratification and future biomarker discovery.

Keywords: Triple-Negative Breast Cancer; Transcriptomics; Machine Learning; PCA; Clustering; Molecular Subtypes; Biomarkers; Personalized Preventive Medicine

 
 
Top