Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis.

IF 5.6 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Marianna Kong, Alicia Fernandez, Jaskaran Bains, Ana Milisavljevic, Katherine C Brooks, Akash Shanmugam, Leslie Avilez, Junhong Li, Vladyslav Honcharov, Andersen Yang, Elaine C Khoong
{"title":"Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis.","authors":"Marianna Kong, Alicia Fernandez, Jaskaran Bains, Ana Milisavljevic, Katherine C Brooks, Akash Shanmugam, Leslie Avilez, Junhong Li, Vladyslav Honcharov, Andersen Yang, Elaine C Khoong","doi":"10.1136/bmjqs-2024-018384","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Machine translation of patient-specific information could mitigate language barriers if sufficiently accurate and non-harmful and may be particularly useful in healthcare encounters when professional translators are not readily available. We evaluated the translation accuracy and potential for harm of ChatGPT-4 and Google Translate in translating from English to Spanish, Chinese and Russian.</p><p><strong>Methods: </strong>We used ChatGPT-4 and Google Translate to translate 50 sets (316 sentences) of deidentified, patient-specific, clinician free-text emergency department instructions into Spanish, Chinese and Russian. These were then back-translated into English by professional translators and double-coded by physicians for accuracy and potential for clinical harm.</p><p><strong>Results: </strong>At the sentence level, we found that both tools were ≥90% accurate in translating English to Spanish (accuracy: GPT 97%, Google Translate 96%) and English to Chinese (accuracy: GPT 95%; Google Translate 90%); neither tool performed as well in translating English to Russian (accuracy: GPT 89%; Google Translate 80%). At the instruction set level, 16%, 24% and 56% of Spanish, Chinese and Russian GPT-translated instruction sets contained at least one inaccuracy. For Google Translate, 24%, 56% and 66% of Spanish, Chinese and Russian translations contained at least one inaccuracy. The potential for harm due to inaccurate translations was ≤1% for both tools in all languages at the sentence level and ≤6% at the instruction set level. GPT was significantly more accurate than Google Translate in Chinese and Russian at the sentence level; the potential for harm was similar.</p><p><strong>Conclusion: </strong>These results support the potential of machine translation tools to mitigate gaps in translation services for low-stakes written communication from English to Spanish, while also strengthening the case for caution and for professional oversight in non-low-risk communication. Further research is needed to evaluate machine translation for other languages and more technical content.</p>","PeriodicalId":9077,"journal":{"name":"BMJ Quality & Safety","volume":" ","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252260/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Quality & Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjqs-2024-018384","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Machine translation of patient-specific information could mitigate language barriers if sufficiently accurate and non-harmful and may be particularly useful in healthcare encounters when professional translators are not readily available. We evaluated the translation accuracy and potential for harm of ChatGPT-4 and Google Translate in translating from English to Spanish, Chinese and Russian.

Methods: We used ChatGPT-4 and Google Translate to translate 50 sets (316 sentences) of deidentified, patient-specific, clinician free-text emergency department instructions into Spanish, Chinese and Russian. These were then back-translated into English by professional translators and double-coded by physicians for accuracy and potential for clinical harm.

Results: At the sentence level, we found that both tools were ≥90% accurate in translating English to Spanish (accuracy: GPT 97%, Google Translate 96%) and English to Chinese (accuracy: GPT 95%; Google Translate 90%); neither tool performed as well in translating English to Russian (accuracy: GPT 89%; Google Translate 80%). At the instruction set level, 16%, 24% and 56% of Spanish, Chinese and Russian GPT-translated instruction sets contained at least one inaccuracy. For Google Translate, 24%, 56% and 66% of Spanish, Chinese and Russian translations contained at least one inaccuracy. The potential for harm due to inaccurate translations was ≤1% for both tools in all languages at the sentence level and ≤6% at the instruction set level. GPT was significantly more accurate than Google Translate in Chinese and Russian at the sentence level; the potential for harm was similar.

Conclusion: These results support the potential of machine translation tools to mitigate gaps in translation services for low-stakes written communication from English to Spanish, while also strengthening the case for caution and for professional oversight in non-low-risk communication. Further research is needed to evaluate machine translation for other languages and more technical content.

病人特定出院指示机器翻译的准确性和安全性评估:比较分析。
导读:如果患者特定信息的机器翻译足够准确且无害,则可以减轻语言障碍,并且在专业翻译人员不容易获得的医疗保健遇到时可能特别有用。我们评估了ChatGPT-4和谷歌Translate在英语、西班牙语、汉语和俄语翻译中的翻译准确性和潜在危害。方法:我们使用ChatGPT-4和谷歌Translate将50组(316句)未识别的、针对患者的、临床医生自由文本的急诊科说明书翻译成西班牙文、中文和俄文。然后由专业翻译人员将其翻译成英文,并由医生进行双重编码,以确保准确性和潜在的临床危害。结果:在句子水平上,我们发现这两个工具在将英语翻译成西班牙语(准确率:GPT 97%,谷歌Translate 96%)和英语翻译成汉语(准确率:GPT 95%;谷歌翻译90%);这两种工具都不能很好地将英语翻译成俄语(准确率:GPT 89%;谷歌翻译80%)。在指令集水平上,16%、24%和56%的西班牙语、中文和俄语gpt翻译的指令集包含至少一个错误。谷歌翻译中,24%、56%和66%的西班牙语、中文和俄语翻译至少有一个错误。在所有语言中,这两种工具在句子级别上由于翻译不准确造成的潜在危害≤1%,在指令集级别上的潜在危害≤6%。GPT翻译在句子水平上的准确率显著高于谷歌翻译;潜在的危害是相似的。结论:这些结果支持机器翻译工具的潜力,以缓解从英语到西班牙语的低风险书面交流翻译服务的差距,同时也加强了对非低风险交流的谨慎和专业监督的案例。需要进一步的研究来评估其他语言和更多技术内容的机器翻译。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMJ Quality & Safety
BMJ Quality & Safety HEALTH CARE SCIENCES & SERVICES-
CiteScore
9.80
自引率
7.40%
发文量
104
审稿时长
4-8 weeks
期刊介绍: BMJ Quality & Safety (previously Quality & Safety in Health Care) is an international peer review publication providing research, opinions, debates and reviews for academics, clinicians and healthcare managers focused on the quality and safety of health care and the science of improvement. The journal receives approximately 1000 manuscripts a year and has an acceptance rate for original research of 12%. Time from submission to first decision averages 22 days and accepted articles are typically published online within 20 days. Its current impact factor is 3.281.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信