Traditional quantitative structure-activity relationship (QSAR) modeling relies solely on molecular descriptors, ignoring the biological state of target organisms. We developed TransQSAR-pf, a framework that integrates Plasmodium falciparum transcriptomic stress response signatures with classical QSAR descriptors.
Public microarray data (GSE10022) from chloroquine-resistant Plasmodium falciparum strains (treated and control) were analyzed. Differentially expressed genes were identified using limma, and Gene Set Enrichment Analysis (GSEA) revealed pathways related to conserved Plasmodium proteins, RNA-binding proteins, and PfEMP1. A transcriptomic feature matrix (764 features) combining expression signatures, pathway enrichment scores, and variability metrics was integrated with 125 triazolopyrimidine derivatives containing experimental IC50 values and molecular descriptors. Boruta feature selection was applied to
reduce the transcriptomic features to 13 critical predictors representing drug response, genotype effects, strain-specific responses, and expression variability. Machine learning models, optimized via 5-fold cross-validation, included Random Forest, SVM, and Elastic Net. The QSAR-only Random Forest baseline achieved R²=0.719 (RMSE=0.529), while integration of all transcriptomic features without selection performed poorly (R²=0.602). Critically, the Boruta-selected model achieved R²=0.762 (RMSE=0.470), a 6.1% improvement over QSAR-only prediction. Biological mapping revealed that 71.2% of predictive importance derived from conserved unknown-function genes, 17.7% from genotype-specific expression, and 11.1% from direct drug response signatures.
TransQSAR-pf demonstrates that strategic integration of pathogen transcriptomic signatures enhances antiplasmodial activity prediction. Conserved stress response pathways emerged as generalizable predictors of compound efficacy, and uncharacterized genes were identified as high-priority targets for mechanistic validation. This framework represents a shift from structure-only to biology-informed drug discovery, with direct applications for virtual screening of antiplasmodial libraries.
