Introduction: Federated learning (FL) has emerged as a promising paradigm for training machine learning models on distributed clinical data without centralizing patient information. While FL is often promoted as a privacy-preserving alternative to traditional centralized learning, its real-world transparency, regulatory alignment, and clinical utility remain unclear. This systematic review synthesizes the current evidence on how FL is implemented in healthcare settings and evaluates its impact on privacy protection, model performance, and clinical integration.
Methods: We conducted a systematic search of PubMed, Embase, Web of Science, and IEEE Xplore for studies published between January 2018 and October 2025 that applied FL to real or realistically simulated clinical datasets. Eligible studies reported at least one of the following: technical details of FL implementation, privacy or security evaluation, model performance versus centralized approaches, or clinical workflow integration. Two independent reviewers screened records, extracted data, and assessed study quality according to the PRISMA recommendations. FL implementations were thematically categorized by clinical domain, data modality, and privacy/monitoring mechanisms.
Results: In total, 1686 articles were screened, and 90 met our inclusion criteria. Most included studies focused on imaging-intensive fields such as radiology, oncology, and ophthalmology, with fewer applications in laboratory medicine, primary care, and mental health. Across domains, FL models generally achieved performance comparable to centralized training. However, formal privacy evaluations (e.g., membership inference, gradient leakage tests, or differential privacy guarantees) were reported inconsistently. Transparency was limited by sparse reporting on model governance, auditability, and communication with clinicians or patients. Very few studies progressed beyond technical feasibility to sustained clinical deployment.
Conclusions: Federated learning shows strong potential to reconcile multi-institutional data sharing constraints with performant AI models, but current implementations do not address transparency and privacy guarantees sufficiently. Standardized reporting frameworks, robust privacy audits, and co-designed governance structures are urgently needed to translate FL from experimental prototypes into clinical infrastructure.
