Introduction: Cardiovascular disease (CVD) remains a leading cause of mortality globally. Although tools like the Pooled Cohort Equation (PCE) are widely used, genetic testing remains rare and typically targets high-penetrance monogenic variants. Using UK Biobank data (18,686 CVD cases and 12,100 controls), this study evaluates the combined use of clinical risk, polygenic risk scores (PRSs), and monogenic variants to improve CVD prediction.
Methods: Clinical risk is evaluated using the PCE, polygenic risk through a PRS derived from 1.2 million SNPs, and monogenic risk by pathogenic variants in LDLR, APOB, and PCSK9. Combined models are evaluated using categorical rules, logistic regression, and XGBoost to capture non-linearities. We also enhance prediction by adding 26 clinical variables from five established risk models.
Results: Cross-validation shows that combined risk models outperform individual predictors in terms of the AUC, particularly among younger individuals. The clinical and PRS models alone yield AUCs of 0.677 and 0.621, while the categorical, logistic regression, and XGBoost models achieve 0.669, 0.699, and 0.701, respectively. The categorical model offers the highest net reclassification index (NRI = 0.058) compared to logistic regression (0.007) and XGBoost (0.010). Incorporating an expanded set of clinical variables further boosts performance, with the AUC reaching 0.796 and the NRI increasing to 0.099.
Conclusions: Our findings underscore the trade-offs among modeling approaches for cardiovascular risk prediction. Combined models outperformed individual predictors, with logistic regression and XGBoost providing better discrimination, especially when enhanced with additional clinical variables, while categorical methods yielded a higher NRI.
