Introduction and objectives:
Morphological analysis of peripheral blood is essential for diagnosis 80% of hematological diseases. Although automatic classification systems in morphological analyzers support diagnosis, image variability caused by differences in reagents, sample preparation, and analyzer optics between centers affects their performance. To address this challenge, this study proposes using federated learning, a collaborative training approach that adapts to the specific characteristics of each center while maintaining high performance despite image variability.
Methods:
The reference center, the Core Laboratory of the Hospital Clínic de Barcelona, provided 10298 images with five leukocyte classes: basophils, eosinophils, lymphocytes, monocytes, and neutrophils. Four public datasets were used as external centers in the federated learning approach: C1 (14514 images), C2 (2513), C3 (5000), and C4 (11353). Data were divided into training, validation, and test sets.
Initially, a VGG16 network was trained with the reference center data, achieving 99.4% accuracy. However, accuracy dropped significantly when evaluated on the external centers: 58.6% in C1, 93.2% in C2, 60.3% in C3, and 69.82% in C4.
The model’s performance was evaluated using precision, recall, specificity, and F1 score:
- C1: 0.736, 0.586, 0.942, 0.657.
- C2: 0.944, 0.932, 0.983, 0.931.
- C3: 0.803, 0.603, 0.901, 0.562.
- C4: 0.711, 0.698, 0.952, 0.785.
To improve generalization, a federated learning approach was used. The first three convolutional blocks of VGG16 were frozen, the remaining blocks were unfrozen to perform fine-tuning with the training sets of each centre, averaging the adjusted weights using the FedDyn technique. This resulted in a Final Global Model that is better adapted to the variability between centers.
Results:
The test sets from all four centers were evaluated again with the Final Global Model, showing significant increases in classification accuracy, 96.18% in C1, 99.6% in C2, 99.5% in C3, and 84.75% in C4, with corresponding metric improvements:
- C1: 0.964, 0.952, 0.992, 0.958.
- C2: 0.992, 0.992, 0.998, 0.992.
- C3: 0.974, 0.973, 0.993, 0.973.
- C4: 0.884, 0.888, 0.992, 0.885.
Conclusion:
Federated learning can effectively fine-tune classifiers in multicenter settings, making the model more robust to the variability between different datasets. This approach shows potential as a tool for automatic recognition in multicenter contexts.