Marianna Kong, Alicia Fernandez, Jaskaran Bains, Ana Milisavljevic, Katherine C Brooks, Akash Shanmugam, Leslie Avilez, Junhong Li, Vladyslav Honcharov, Andersen Yang, Elaine C Khoong
{"title":"病人特定出院指示机器翻译的准确性和安全性评估:比较分析。","authors":"Marianna Kong, Alicia Fernandez, Jaskaran Bains, Ana Milisavljevic, Katherine C Brooks, Akash Shanmugam, Leslie Avilez, Junhong Li, Vladyslav Honcharov, Andersen Yang, Elaine C Khoong","doi":"10.1136/bmjqs-2024-018384","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Machine translation of patient-specific information could mitigate language barriers if sufficiently accurate and non-harmful and may be particularly useful in healthcare encounters when professional translators are not readily available. We evaluated the translation accuracy and potential for harm of ChatGPT-4 and Google Translate in translating from English to Spanish, Chinese and Russian.</p><p><strong>Methods: </strong>We used ChatGPT-4 and Google Translate to translate 50 sets (316 sentences) of deidentified, patient-specific, clinician free-text emergency department instructions into Spanish, Chinese and Russian. These were then back-translated into English by professional translators and double-coded by physicians for accuracy and potential for clinical harm.</p><p><strong>Results: </strong>At the sentence level, we found that both tools were ≥90% accurate in translating English to Spanish (accuracy: GPT 97%, Google Translate 96%) and English to Chinese (accuracy: GPT 95%; Google Translate 90%); neither tool performed as well in translating English to Russian (accuracy: GPT 89%; Google Translate 80%). At the instruction set level, 16%, 24% and 56% of Spanish, Chinese and Russian GPT-translated instruction sets contained at least one inaccuracy. For Google Translate, 24%, 56% and 66% of Spanish, Chinese and Russian translations contained at least one inaccuracy. The potential for harm due to inaccurate translations was ≤1% for both tools in all languages at the sentence level and ≤6% at the instruction set level. GPT was significantly more accurate than Google Translate in Chinese and Russian at the sentence level; the potential for harm was similar.</p><p><strong>Conclusion: </strong>These results support the potential of machine translation tools to mitigate gaps in translation services for low-stakes written communication from English to Spanish, while also strengthening the case for caution and for professional oversight in non-low-risk communication. Further research is needed to evaluate machine translation for other languages and more technical content.</p>","PeriodicalId":9077,"journal":{"name":"BMJ Quality & Safety","volume":" ","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252260/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis.\",\"authors\":\"Marianna Kong, Alicia Fernandez, Jaskaran Bains, Ana Milisavljevic, Katherine C Brooks, Akash Shanmugam, Leslie Avilez, Junhong Li, Vladyslav Honcharov, Andersen Yang, Elaine C Khoong\",\"doi\":\"10.1136/bmjqs-2024-018384\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Machine translation of patient-specific information could mitigate language barriers if sufficiently accurate and non-harmful and may be particularly useful in healthcare encounters when professional translators are not readily available. We evaluated the translation accuracy and potential for harm of ChatGPT-4 and Google Translate in translating from English to Spanish, Chinese and Russian.</p><p><strong>Methods: </strong>We used ChatGPT-4 and Google Translate to translate 50 sets (316 sentences) of deidentified, patient-specific, clinician free-text emergency department instructions into Spanish, Chinese and Russian. These were then back-translated into English by professional translators and double-coded by physicians for accuracy and potential for clinical harm.</p><p><strong>Results: </strong>At the sentence level, we found that both tools were ≥90% accurate in translating English to Spanish (accuracy: GPT 97%, Google Translate 96%) and English to Chinese (accuracy: GPT 95%; Google Translate 90%); neither tool performed as well in translating English to Russian (accuracy: GPT 89%; Google Translate 80%). At the instruction set level, 16%, 24% and 56% of Spanish, Chinese and Russian GPT-translated instruction sets contained at least one inaccuracy. For Google Translate, 24%, 56% and 66% of Spanish, Chinese and Russian translations contained at least one inaccuracy. The potential for harm due to inaccurate translations was ≤1% for both tools in all languages at the sentence level and ≤6% at the instruction set level. GPT was significantly more accurate than Google Translate in Chinese and Russian at the sentence level; the potential for harm was similar.</p><p><strong>Conclusion: </strong>These results support the potential of machine translation tools to mitigate gaps in translation services for low-stakes written communication from English to Spanish, while also strengthening the case for caution and for professional oversight in non-low-risk communication. Further research is needed to evaluate machine translation for other languages and more technical content.</p>\",\"PeriodicalId\":9077,\"journal\":{\"name\":\"BMJ Quality & Safety\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252260/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Quality & Safety\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjqs-2024-018384\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Quality & Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjqs-2024-018384","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis.
Introduction: Machine translation of patient-specific information could mitigate language barriers if sufficiently accurate and non-harmful and may be particularly useful in healthcare encounters when professional translators are not readily available. We evaluated the translation accuracy and potential for harm of ChatGPT-4 and Google Translate in translating from English to Spanish, Chinese and Russian.
Methods: We used ChatGPT-4 and Google Translate to translate 50 sets (316 sentences) of deidentified, patient-specific, clinician free-text emergency department instructions into Spanish, Chinese and Russian. These were then back-translated into English by professional translators and double-coded by physicians for accuracy and potential for clinical harm.
Results: At the sentence level, we found that both tools were ≥90% accurate in translating English to Spanish (accuracy: GPT 97%, Google Translate 96%) and English to Chinese (accuracy: GPT 95%; Google Translate 90%); neither tool performed as well in translating English to Russian (accuracy: GPT 89%; Google Translate 80%). At the instruction set level, 16%, 24% and 56% of Spanish, Chinese and Russian GPT-translated instruction sets contained at least one inaccuracy. For Google Translate, 24%, 56% and 66% of Spanish, Chinese and Russian translations contained at least one inaccuracy. The potential for harm due to inaccurate translations was ≤1% for both tools in all languages at the sentence level and ≤6% at the instruction set level. GPT was significantly more accurate than Google Translate in Chinese and Russian at the sentence level; the potential for harm was similar.
Conclusion: These results support the potential of machine translation tools to mitigate gaps in translation services for low-stakes written communication from English to Spanish, while also strengthening the case for caution and for professional oversight in non-low-risk communication. Further research is needed to evaluate machine translation for other languages and more technical content.
期刊介绍:
BMJ Quality & Safety (previously Quality & Safety in Health Care) is an international peer review publication providing research, opinions, debates and reviews for academics, clinicians and healthcare managers focused on the quality and safety of health care and the science of improvement.
The journal receives approximately 1000 manuscripts a year and has an acceptance rate for original research of 12%. Time from submission to first decision averages 22 days and accepted articles are typically published online within 20 days. Its current impact factor is 3.281.