基于深度学习的救护车语音识别和使用llm生成院前紧急诊断摘要

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics Pub Date : 2025-07-07 DOI:10.1016/j.ijmedinf.2025.106029

Chen Chen , Yingying Hu , Wenwei Cai , Huibin Pan , Meihong Shen , Yujie Zhai , Shanhui Wu , Qunyi Zhou , Yi Guo

{"title":"基于深度学习的救护车语音识别和使用llm生成院前紧急诊断摘要","authors":"Chen Chen , Yingying Hu , Wenwei Cai , Huibin Pan , Meihong Shen , Yujie Zhai , Shanhui Wu , Qunyi Zhou , Yi Guo","doi":"10.1016/j.ijmedinf.2025.106029","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The timely and accurate submission of prehospital electronic medical records is crucial for the efficiency of medical rescue operations. However, personnel professional experience, training cycles, and environmental conditions often influence its completion rate. This study proposes integrating noise-robust speech recognition technology with large language models (LLMs) to generate emergency diagnosis summaries. This approach aims to help medical personnel quickly document key patient information, streamlining the emergency response process.</div></div><div><h3>Methods</h3><div>A joint training model combining speech enhancement and recognition was proposed, incorporating LLMs to generate emergency diagnosis summaries. The model was trained in two rounds using actual ambulance noise data, environmental noise data, and open-source speech datasets. The model optimized Connectionist Temporal Classification(CTC) and attention loss through deep feature extraction and the selective attention mechanism. The study also analyzed the impact of different prompt designs on the quality of LLMs-generated summaries. Tukey HSD and Holm correction methods were employed for multiple comparisons of three subjective evaluation metrics under three prompts for three models, assessing the statistical significance of each factor’s influence on the generation results.</div></div><div><h3>Results</h3><div>The proposed speech recognition model reduced the character error rate in real-world ambulance noise recordings to 52.92%, outperforming several comparative speech recognition models. Under the Stylized Prompt condition, the Qwen2.5-7B-Instruct model demonstrated superior accuracy and relevance compared to other models in terms of subjectivity and relevance, reducing the average completion time for prehospital electronic medical records from 20 min to 14 min.</div></div><div><h3>Conclusion</h3><div>Using noise-robust speech recognition combined with LLMs to generate emergency diagnosis summaries improves efficiency and enhances medical record completion. This approach demonstrates broad application potential in emergencies and could be extended to quality evaluation, disease prediction, and risk assessment.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"203 ","pages":"Article 106029"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning-based in-ambulance speech recognition and generation of prehospital emergency diagnostic summaries using LLMs\",\"authors\":\"Chen Chen , Yingying Hu , Wenwei Cai , Huibin Pan , Meihong Shen , Yujie Zhai , Shanhui Wu , Qunyi Zhou , Yi Guo\",\"doi\":\"10.1016/j.ijmedinf.2025.106029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>The timely and accurate submission of prehospital electronic medical records is crucial for the efficiency of medical rescue operations. However, personnel professional experience, training cycles, and environmental conditions often influence its completion rate. This study proposes integrating noise-robust speech recognition technology with large language models (LLMs) to generate emergency diagnosis summaries. This approach aims to help medical personnel quickly document key patient information, streamlining the emergency response process.</div></div><div><h3>Methods</h3><div>A joint training model combining speech enhancement and recognition was proposed, incorporating LLMs to generate emergency diagnosis summaries. The model was trained in two rounds using actual ambulance noise data, environmental noise data, and open-source speech datasets. The model optimized Connectionist Temporal Classification(CTC) and attention loss through deep feature extraction and the selective attention mechanism. The study also analyzed the impact of different prompt designs on the quality of LLMs-generated summaries. Tukey HSD and Holm correction methods were employed for multiple comparisons of three subjective evaluation metrics under three prompts for three models, assessing the statistical significance of each factor’s influence on the generation results.</div></div><div><h3>Results</h3><div>The proposed speech recognition model reduced the character error rate in real-world ambulance noise recordings to 52.92%, outperforming several comparative speech recognition models. Under the Stylized Prompt condition, the Qwen2.5-7B-Instruct model demonstrated superior accuracy and relevance compared to other models in terms of subjectivity and relevance, reducing the average completion time for prehospital electronic medical records from 20 min to 14 min.</div></div><div><h3>Conclusion</h3><div>Using noise-robust speech recognition combined with LLMs to generate emergency diagnosis summaries improves efficiency and enhances medical record completion. This approach demonstrates broad application potential in emergencies and could be extended to quality evaluation, disease prediction, and risk assessment.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"203 \",\"pages\":\"Article 106029\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625002461\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002461","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目的院前电子病历的及时、准确提交，是提高医疗救援工作效率的关键。然而，人员的专业经验、培训周期和环境条件往往会影响其完成率。本研究提出将噪声鲁棒语音识别技术与大型语言模型（llm）相结合，生成紧急诊断摘要。这种方法旨在帮助医务人员快速记录关键患者信息，简化应急响应过程。方法提出语音增强与识别相结合的联合训练模型，结合llm生成应急诊断摘要。该模型使用实际救护车噪声数据、环境噪声数据和开源语音数据集进行了两轮训练。该模型通过深度特征提取和选择性注意机制对连接主义时间分类（CTC）和注意损失进行优化。本研究还分析了不同提示设计对法学硕士生成摘要质量的影响。采用Tukey HSD和Holm校正方法对三种模型的三种提示下的三种主观评价指标进行多重比较，评估各因素对生成结果影响的统计学显著性。结果所提出的语音识别模型将真实救护车噪声记录的字符错误率降低到52.92%，优于几种比较语音识别模型。在程式化提示条件下，Qwen2.5-7B-Instruct模型在主观性和相关性方面均表现出优于其他模型的准确性和相关性，将院前电子病历的平均完成时间从20 min减少到14 min。结论采用噪声鲁棒性语音识别结合llm生成急诊诊断摘要，提高了效率，提高了病历的完成率。该方法在突发事件中具有广泛的应用潜力，可推广到质量评价、疾病预测和风险评估等领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep learning-based in-ambulance speech recognition and generation of prehospital emergency diagnostic summaries using LLMs

Objective

The timely and accurate submission of prehospital electronic medical records is crucial for the efficiency of medical rescue operations. However, personnel professional experience, training cycles, and environmental conditions often influence its completion rate. This study proposes integrating noise-robust speech recognition technology with large language models (LLMs) to generate emergency diagnosis summaries. This approach aims to help medical personnel quickly document key patient information, streamlining the emergency response process.

Methods

A joint training model combining speech enhancement and recognition was proposed, incorporating LLMs to generate emergency diagnosis summaries. The model was trained in two rounds using actual ambulance noise data, environmental noise data, and open-source speech datasets. The model optimized Connectionist Temporal Classification(CTC) and attention loss through deep feature extraction and the selective attention mechanism. The study also analyzed the impact of different prompt designs on the quality of LLMs-generated summaries. Tukey HSD and Holm correction methods were employed for multiple comparisons of three subjective evaluation metrics under three prompts for three models, assessing the statistical significance of each factor’s influence on the generation results.

Results

The proposed speech recognition model reduced the character error rate in real-world ambulance noise recordings to 52.92%, outperforming several comparative speech recognition models. Under the Stylized Prompt condition, the Qwen2.5-7B-Instruct model demonstrated superior accuracy and relevance compared to other models in terms of subjectivity and relevance, reducing the average completion time for prehospital electronic medical records from 20 min to 14 min.

Conclusion

Using noise-robust speech recognition combined with LLMs to generate emergency diagnosis summaries improves efficiency and enhances medical record completion. This approach demonstrates broad application potential in emergencies and could be extended to quality evaluation, disease prediction, and risk assessment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.