Introduction:
Stroke remains a major global cause of death and long-term disability, emphasizing the need for rapid and accurate diagnosis to support timely clinical decisions.
Methods:
This study proposes a hybrid deep ensemble framework that integrates convolutional neural networks (ResNet50) and transformer-based models (Swin Transformer) to detect stroke from multimodal neuroimaging data. A publicly available Kaggle dataset comprising 5,336 CT and MRI scans from 230 participants (113 females, 117 males) was used, including 2,695 stroke and 2,641 control images. Features extracted from both networks were fused into a unified representation and reduced using PCA-based cumulative weighted neighborhood component analysis (CW-NCA). The reduced features were classified using a stacking ensemble of k-nearest neighbors, a support vector machine (SVM), XGBoost, and multilayer perceptron (MLP), with a linear SVM as the meta-learner.
Results:
The proposed model achieved a validation accuracy of 94.10%. For the Normal class, the precision, recall, and F1-score were 93.80%, 94.33%, and 94.06%, respectively. For the Stroke class, the precision, recall, and F1-score were 94.40%, 93.88%, and 94.14%, respectively. The model demonstrated balanced performance across both classes, indicating robustness and reduced bias.
Discussion and Conclusion:
Compared with single-model CNN or transformer methods, the hybrid ensemble exhibited superior generalization and class stability. These findings suggest that the proposed framework can serve as a promising decision-support system for early stroke detection, helping clinicians achieve faster and more reliable diagnoses.