Introduction
This study investigates the development and evaluation of an advanced automated system for glaucoma detection using deep learning techniques. Traditional diagnostic methods for glaucoma are often time-consuming and reliant on ophthalmologist expertise, leading to inconsistencies and delays in treatment. By utilizing state-of-the-art transformer-based models, this research aims to improve the accuracy and efficiency of glaucoma detection.
Methods
Five publicly available retinal fundus image datasets—ODIR-5K, ACRIMA, RIM-ONE, ORIGA, and REFUGE—were merged into one large dataset to ensure comprehensive model training and evaluation. The SegFormer model was employed for optic cup and disc segmentation, addressing the limitations of traditional CNNs in feature discrimination. This model captures both local and global contexts in fundus images, which is critical for accurate glaucoma detection. Segmented images were then classified using the Swin Transformer, known for its hierarchical architecture and ability to efficiently process high-resolution images through shifted window self-attention mechanisms. Data manipulation and preprocessing were conducted using Pandas and NumPy to optimize model performance.
Results
The combination of SegFormer for segmentation and Swin Transformer for classification resulted in superior performance compared to standalone models and other CNN-based approaches. The proposed model achieved an accuracy of 97.8%, precision of 97.5%, recall of 98.29%, and an F1-score of 98.33%. This significantly outperformed other state-of-the-art CNN models, demonstrating the effectiveness of transformer-based architectures in glaucoma detection.
Conclusions
This research showcases the potential of integrating SegFormer and Swin Transformer models for automated glaucoma detection. The high accuracy and scalability of this system suggest broader applications in medical diagnostics, offering a reliable and efficient solution for clinical settings.