Please login first
Enhancing Emergency Medical Communication: A Multi-Model Information Extraction Pipeline for Ambulance Communication using LLMs
* 1 , * 2 , 3 , 3 , 4 , 5
1  Computer Science, Saarland University, 66123 Saarbrücken, Germany
2  Educational Technology Lab, German Research Center for Artificial Intelligence (DFKI), 10587, Berlin, Germany
3  Hochschule Bremerhaven, 27568 Bremerhaven, Germany
4  Saarland Informatics Campus, Saarland University, 66123, Saarbrücken, Germany
5  University of South Wales, Cardiff CF24 2FN, United Kingdom
Academic Editor: Lucia Billeci

Abstract:

Effective communication in emergency medical services is critical in high-stakes scenarios, where information must be conveyed with speed, precision, and clarity. However, background noise, stress-induced speech patterns, and the use of specialized medical terminology frequently hinder comprehension. Improving the reliability of emergency communication is therefore a pressing challenge for both clinical outcomes and operational efficiency. This paper introduces a robust multi-model information extraction pipeline designed to enhance the accuracy and efficiency of emergency medical communication. The pipeline integrates advanced Speech-to-Text (STT) systems with Large Language Models (LLMs) to both improve transcription fidelity and extract mission-critical medical data. It comprises four modules: (1) audio capture of simulated German emergency communications under varied acoustic conditions, (2) STT transcription using Whisper, Azure, and IBM Watson, (3) LLM-driven refinement of transcriptions with GPT-4 to correct grammatical and terminological errors, and (4) structured information extraction with GPT-4, LLaMA 3.2, and Mixtral-8, guided by Chain-of-Thought and role-based prompting. The whole pipeline is evaluated using Word Error Rate (WER), BLEU, ROUGE-L, and semantic similarity, alongside accuracy, completeness, and relevance of extracted data. Azure STT with GPT-4 proved optimal, achieving the lowest post-refinement WER (0.1812, a 32.7% improvement), and high semantic similarity (0.9736), ROUGE-L (0.8802), and BLEU (0.7457). GPT-4 reached near-perfect extraction accuracy (0.995), surpassing LLaMA 3.2 and Mixtral-8, though Mixtral-8 remained highly competitive (0.980 accuracy with Whisper). Overall, the proposed pipeline demonstrates how combining STT and LLMs can transform noisy emergency dialogues into precise, structured clinical data, advancing responsive and reliable emergency management systems.

Keywords: Emergency Medical Communication; Speech-to-Text; Large Language Models; Information Extraction; Natural Language Processing; Healthcare AI; Prompt Engineering
Comments on this paper
Currently there are no comments available.


 
 
Top