Joel Jia Wei Ng, Eugene Wang, Xinyan Zhou, Kevin Xiang Zhou, Charlene Xing Le Goh, Gabriel Zheng Ning Sim, Hiang Khoon Tan, Serene Si Ning Goh, Qin Xiang Ng
{"title":"评估基于人工智能的语音识别在临床文献中的表现:系统综述。","authors":"Joel Jia Wei Ng, Eugene Wang, Xinyan Zhou, Kevin Xiang Zhou, Charlene Xing Le Goh, Gabriel Zheng Ning Sim, Hiang Khoon Tan, Serene Si Ning Goh, Qin Xiang Ng","doi":"10.1186/s12911-025-03061-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clinical documentation is vital for effective communication, legal accountability and the continuity of care in healthcare. Traditional documentation methods, such as manual transcription, are time-consuming, prone to errors and contribute to clinician burnout. AI-driven transcription systems utilizing automatic speech recognition (ASR) and natural language processing (NLP) aim to automate and enhance the accuracy and efficiency of clinical documentation. However, the performance of these systems varies significantly across clinical settings, necessitating a systematic review of the published studies.</p><p><strong>Methods: </strong>A comprehensive search of MEDLINE, Embase, and the Cochrane Library identified studies evaluating AI transcription tools in clinical settings, covering all records up to February 16, 2025. Inclusion criteria encompassed studies involving clinicians using AI-based transcription software, reporting outcomes such as accuracy (e.g., Word Error Rate), time efficiency and user satisfaction. Data were extracted systematically, and study quality was assessed using the QUADAS-2 tool. Due to heterogeneity in study designs and outcomes, a narrative synthesis was performed, with key findings and commonalities reported.</p><p><strong>Results: </strong>Twenty-nine studies met the inclusion criteria. Reported word error rates ranged widely, from 0.087 in controlled dictation settings to over 50% in conversational or multi-speaker scenarios. F1 scores spanned 0.416 to 0.856, reflecting variability in accuracy. Although some studies highlighted reductions in documentation time and improvements in note completeness, others noted increased editing burdens, inconsistent cost-effectiveness and persistent errors with specialized terminology or accented speech. Recent LLM-based approaches offered automated summarization features, yet often required human review to ensure clinical safety.</p><p><strong>Conclusions: </strong>AI-based transcription systems show potential to improve clinical documentation but face challenges in accuracy, adaptability and workflow integration. Refinements in domain-specific training, real-time error correction and interoperability with electronic health records are critical for their effective adoption in clinical practice. Future research should also focus on next-generation \"digital scribes\" incorporating LLM-driven summarization and repurposing of text.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"236"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220090/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating the performance of artificial intelligence-based speech recognition for clinical documentation: a systematic review.\",\"authors\":\"Joel Jia Wei Ng, Eugene Wang, Xinyan Zhou, Kevin Xiang Zhou, Charlene Xing Le Goh, Gabriel Zheng Ning Sim, Hiang Khoon Tan, Serene Si Ning Goh, Qin Xiang Ng\",\"doi\":\"10.1186/s12911-025-03061-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Clinical documentation is vital for effective communication, legal accountability and the continuity of care in healthcare. Traditional documentation methods, such as manual transcription, are time-consuming, prone to errors and contribute to clinician burnout. AI-driven transcription systems utilizing automatic speech recognition (ASR) and natural language processing (NLP) aim to automate and enhance the accuracy and efficiency of clinical documentation. However, the performance of these systems varies significantly across clinical settings, necessitating a systematic review of the published studies.</p><p><strong>Methods: </strong>A comprehensive search of MEDLINE, Embase, and the Cochrane Library identified studies evaluating AI transcription tools in clinical settings, covering all records up to February 16, 2025. Inclusion criteria encompassed studies involving clinicians using AI-based transcription software, reporting outcomes such as accuracy (e.g., Word Error Rate), time efficiency and user satisfaction. Data were extracted systematically, and study quality was assessed using the QUADAS-2 tool. Due to heterogeneity in study designs and outcomes, a narrative synthesis was performed, with key findings and commonalities reported.</p><p><strong>Results: </strong>Twenty-nine studies met the inclusion criteria. Reported word error rates ranged widely, from 0.087 in controlled dictation settings to over 50% in conversational or multi-speaker scenarios. F1 scores spanned 0.416 to 0.856, reflecting variability in accuracy. Although some studies highlighted reductions in documentation time and improvements in note completeness, others noted increased editing burdens, inconsistent cost-effectiveness and persistent errors with specialized terminology or accented speech. Recent LLM-based approaches offered automated summarization features, yet often required human review to ensure clinical safety.</p><p><strong>Conclusions: </strong>AI-based transcription systems show potential to improve clinical documentation but face challenges in accuracy, adaptability and workflow integration. Refinements in domain-specific training, real-time error correction and interoperability with electronic health records are critical for their effective adoption in clinical practice. Future research should also focus on next-generation \\\"digital scribes\\\" incorporating LLM-driven summarization and repurposing of text.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"236\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220090/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-03061-0\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03061-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Evaluating the performance of artificial intelligence-based speech recognition for clinical documentation: a systematic review.
Background: Clinical documentation is vital for effective communication, legal accountability and the continuity of care in healthcare. Traditional documentation methods, such as manual transcription, are time-consuming, prone to errors and contribute to clinician burnout. AI-driven transcription systems utilizing automatic speech recognition (ASR) and natural language processing (NLP) aim to automate and enhance the accuracy and efficiency of clinical documentation. However, the performance of these systems varies significantly across clinical settings, necessitating a systematic review of the published studies.
Methods: A comprehensive search of MEDLINE, Embase, and the Cochrane Library identified studies evaluating AI transcription tools in clinical settings, covering all records up to February 16, 2025. Inclusion criteria encompassed studies involving clinicians using AI-based transcription software, reporting outcomes such as accuracy (e.g., Word Error Rate), time efficiency and user satisfaction. Data were extracted systematically, and study quality was assessed using the QUADAS-2 tool. Due to heterogeneity in study designs and outcomes, a narrative synthesis was performed, with key findings and commonalities reported.
Results: Twenty-nine studies met the inclusion criteria. Reported word error rates ranged widely, from 0.087 in controlled dictation settings to over 50% in conversational or multi-speaker scenarios. F1 scores spanned 0.416 to 0.856, reflecting variability in accuracy. Although some studies highlighted reductions in documentation time and improvements in note completeness, others noted increased editing burdens, inconsistent cost-effectiveness and persistent errors with specialized terminology or accented speech. Recent LLM-based approaches offered automated summarization features, yet often required human review to ensure clinical safety.
Conclusions: AI-based transcription systems show potential to improve clinical documentation but face challenges in accuracy, adaptability and workflow integration. Refinements in domain-specific training, real-time error correction and interoperability with electronic health records are critical for their effective adoption in clinical practice. Future research should also focus on next-generation "digital scribes" incorporating LLM-driven summarization and repurposing of text.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.