Purpose: Hepatocellular carcinoma (HCC) is one of the most aggressive and prevalent forms of liver cancer, posing a serious threat to global health, affirming the urgent need for reliable biomarkers to improve diagnosis and prognosis. Existing approaches often fail to capture nonlinear gene interactions and face challenges of high dimensionality, which limits their clinical translation.
Methods: The differentially expressed genes (DEGs) were identified by analyzing the gene expression profiles of 445 HCC and normal tissues. These filter genes were fed to a deep autoencoder to reduce the dimensionality of the data, capture non-linear interactions, and extract informative latent features. Predictive gene features were selected through mutual information (MI) ranking and LASSO regression and subsequently evaluated using logistic regression (LR), random forest (RF), and support vector machine (SVM) classifiers with 5-fold cross-validation. The top 50 genes underwent Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. Protein–protein interaction (PPI) networks were constructed to identify core hub genes, followed by gene–drug interaction, transcription factor analysis, and survival validation.
Results: Enrichment analysis revealed involvement in metabolic pathways, PI3K/Akt signaling, protein processing in the endoplasmic reticulum, fatty acid metabolism, cell cycle regulation, and viral carcinogenesis. Cross-validation showed stable and high performance across classifiers (accuracy = 0.969, F1 ≈ 0.97), with RF and SVM achieving slightly higher AUC (~0.983 and ~0.982). Independent tests showed excellent performance across all classifiers (~1.0), confirming high feature discriminative power. Survival analysis showed that hub genes, including HSP90AB1, TUBA1B, PKM, H2AZ1, YWHAZ, ACLY, RAN, ILF2, KPNA2, and TXNRD1, were significantly associated with poor prognosis (HR > 1.5, p < 0.05), correlating with reduced overall, relapse-free, and disease-specific survival in HCC.
Conclusion: This integrative novel framework effectively identifies biologically relevant biomarkers, providing insights into HCC mechanisms and potential targets for precision therapy.
