Please login first
ChatGPT as a Tool for Generating Clinical Simulation Cases in Emergency Medicine Education: A Validity-Based Comparative Study
* ,
1  Department of Emergency and Urgent Medical Care, S.D. Asfendiyarov Kazakh National Medical University, Almaty, Kazakhstan
Academic Editor: Mike Joy

Abstract:

Abstract

Artificial intelligence (AI) is increasingly integrated into higher education, offering scalable tools for developing interactive learning resources. In medical education, large language models such as ChatGPT show potential for generating clinical simulation cases; however, their educational validity and clinical reliability remain insufficiently examined. Ensuring the accuracy and pedagogical quality of AI‑generated materials is essential before integrating them into formal training programs.

Methods

The study was conducted by 1 author of the prompts; 2 authors of the expert (human) clinical scenarios; and 5 independent expert reviewers. A comparative validity study was conducted using 50 multiple‑choice clinical scenarios (MCQs) in emergency medicine: 25 developed by experienced instructors and 25 generated using ChatGPT (GPT‑5 mini). The scenarios covered five core emergency topics: cardiac arrest, shock, trauma and accidents, acute coronary syndrome, and acute respiratory failure. Five independent experts evaluated all cases using ten predefined content validity criteria, including clinical accuracy, completeness, structural clarity, realism, educational value, error‑free presentation, applicability, coherence between scenario and question, uniqueness, and distractor homogeneity. Quantitative assessment included the Item Content Validity Index (I‑CVI) and Aiken’s V coefficient.

Results

Instructor‑developed cases demonstrated significantly higher overall quality than AI‑generated cases (3.8 ± 0.13 vs. 3.0 ± 0.59; p < 0.001). Expert-developed cases showed excellent content validity (I‑CVI = 0.984; S‑CVI/Ave = 0.99), whereas AI‑generated cases demonstrated substantially lower validity (I‑CVI = 0.496; S‑CVI/Ave = 0.50). Aiken’s V indicated very high expert agreement for instructor‑developed cases (0.936) and moderate agreement for AI‑generated cases (0.671). Common issues in AI‑generated cases included insufficient clinical detail, heterogeneous distractors, and occasional logical inconsistencies.

Conclusion

ChatGPT can serve as an efficient support tool for the rapid generation of emergency medicine simulation cases. However, expert review and pedagogical refinement remain essential to ensure clinical accuracy and educational quality. AI‑generated content should complement, rather than replace, expert‑designed instructional materials.

Keywords: artificial intelligence; ChatGPT; emergency medicine; medical education; clinical simulation; content validity; assessment design
Top