使用标准化框架评估将患者说明翻译成西班牙语的大型语言模型。

IF 18 1区 医学 Q1 PEDIATRICS
Mondira Ray,Daniel J Kats,Joss Moorkens,Dinesh Rai,Nate Shaar,Diane Quinones,Alejandro Vermeulen,Camila M Mateo,Ryan C L Brewster,Alisa Khan,Benjamin Rader,John S Brownstein,Jonathan D Hron
{"title":"使用标准化框架评估将患者说明翻译成西班牙语的大型语言模型。","authors":"Mondira Ray,Daniel J Kats,Joss Moorkens,Dinesh Rai,Nate Shaar,Diane Quinones,Alejandro Vermeulen,Camila M Mateo,Ryan C L Brewster,Alisa Khan,Benjamin Rader,John S Brownstein,Jonathan D Hron","doi":"10.1001/jamapediatrics.2025.1729","DOIUrl":null,"url":null,"abstract":"Importance\r\nPatients and caregivers who use languages other than English in the US encounter barriers to accessing language-concordant written instructions after clinical visits. Large language models (LLMs), such as OpenAI's GPT-4o, may improve access to translated patient materials; however, rigorous evaluation is needed to ensure clinical standards are met.\r\n\r\nObjective\r\nTo determine whether GPT-4o can generate high-quality Spanish translations of personalized patient instructions comparable to those performed by professional human translators.\r\n\r\nDesign, Setting, and Participants\r\nThis cross-sectional study compared LLM translations to professional human translations using equivalence testing. The personalized pediatric instructions used were derived from real clinical encounters at a large US academic medical center and translated between January 2023 and December 2023. Patient instructions in English were translated into Spanish by GPT-4o and professional human translators. The source English texts were translated using GPT-4o on August 2, 2024. Both sets of translations were evaluated by 3 independent professional medical translators.\r\n\r\nExposure\r\nPatient instructions were translated using GPT-4o with an engineered prompt, and these translations were compared with those produced by professional human translators.\r\n\r\nMain Outcomes and Measures\r\nThe primary outcome was translation quality, assessed using the Multidimensional Quality Metrics (MQM) framework to generate an overall MQM score (rated on a 0-100 scale). Secondary outcomes included a general preference rating and error rates for types of translation errors.\r\n\r\nResults\r\nThis study included 20 source files of pediatric patient instructions. Equivalence testing showed no significant difference in translation quality between GPT-4o and human translations, with a mean difference of 1.6 points (90% CI, 0.7-2.5), falling within a predefined equivalence margin of plus or minus 5 MQM points. The LLM yielded fewer mistranslation errors, and a mean (SE) of 52% (6%) of professional translator ratings preferred the LLM translations.\r\n\r\nConclusions and Relevance\r\nIn this cross-sectional study, GPT-4o generated Spanish translations of pediatric patient instructions that were comparable in quality to those by professional human translators as evaluated using a standardized framework. While human review of LLM translation remains essential in health care, these findings suggest that GPT-4o could reduce the translation workload for Spanish, potentially freeing resources to support languages of lesser diffusion.","PeriodicalId":14683,"journal":{"name":"JAMA Pediatrics","volume":"47 1","pages":""},"PeriodicalIF":18.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating a Large Language Model in Translating Patient Instructions to Spanish Using a Standardized Framework.\",\"authors\":\"Mondira Ray,Daniel J Kats,Joss Moorkens,Dinesh Rai,Nate Shaar,Diane Quinones,Alejandro Vermeulen,Camila M Mateo,Ryan C L Brewster,Alisa Khan,Benjamin Rader,John S Brownstein,Jonathan D Hron\",\"doi\":\"10.1001/jamapediatrics.2025.1729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Importance\\r\\nPatients and caregivers who use languages other than English in the US encounter barriers to accessing language-concordant written instructions after clinical visits. Large language models (LLMs), such as OpenAI's GPT-4o, may improve access to translated patient materials; however, rigorous evaluation is needed to ensure clinical standards are met.\\r\\n\\r\\nObjective\\r\\nTo determine whether GPT-4o can generate high-quality Spanish translations of personalized patient instructions comparable to those performed by professional human translators.\\r\\n\\r\\nDesign, Setting, and Participants\\r\\nThis cross-sectional study compared LLM translations to professional human translations using equivalence testing. The personalized pediatric instructions used were derived from real clinical encounters at a large US academic medical center and translated between January 2023 and December 2023. Patient instructions in English were translated into Spanish by GPT-4o and professional human translators. The source English texts were translated using GPT-4o on August 2, 2024. Both sets of translations were evaluated by 3 independent professional medical translators.\\r\\n\\r\\nExposure\\r\\nPatient instructions were translated using GPT-4o with an engineered prompt, and these translations were compared with those produced by professional human translators.\\r\\n\\r\\nMain Outcomes and Measures\\r\\nThe primary outcome was translation quality, assessed using the Multidimensional Quality Metrics (MQM) framework to generate an overall MQM score (rated on a 0-100 scale). Secondary outcomes included a general preference rating and error rates for types of translation errors.\\r\\n\\r\\nResults\\r\\nThis study included 20 source files of pediatric patient instructions. Equivalence testing showed no significant difference in translation quality between GPT-4o and human translations, with a mean difference of 1.6 points (90% CI, 0.7-2.5), falling within a predefined equivalence margin of plus or minus 5 MQM points. The LLM yielded fewer mistranslation errors, and a mean (SE) of 52% (6%) of professional translator ratings preferred the LLM translations.\\r\\n\\r\\nConclusions and Relevance\\r\\nIn this cross-sectional study, GPT-4o generated Spanish translations of pediatric patient instructions that were comparable in quality to those by professional human translators as evaluated using a standardized framework. While human review of LLM translation remains essential in health care, these findings suggest that GPT-4o could reduce the translation workload for Spanish, potentially freeing resources to support languages of lesser diffusion.\",\"PeriodicalId\":14683,\"journal\":{\"name\":\"JAMA Pediatrics\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":18.0000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMA Pediatrics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1001/jamapediatrics.2025.1729\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA Pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamapediatrics.2025.1729","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

摘要

在美国,使用英语以外语言的患者和护理人员在临床就诊后,在获取语言一致的书面说明方面遇到障碍。大型语言模型(llm),如OpenAI的gpt - 40,可能会改善对翻译患者材料的访问;然而,需要进行严格的评估以确保达到临床标准。目的确定gpt - 40是否能够生成与专业翻译人员相当的个性化患者指示的高质量西班牙语翻译。设计、设置和参与者本横断面研究使用等效性测试将法学硕士翻译与专业翻译进行比较。所使用的个性化儿科指导来自于美国一家大型学术医疗中心的真实临床经验,并在2023年1月至2023年12月期间进行了翻译。患者的英文说明由gpt - 40和专业的人工翻译人员翻译成西班牙语。源英文文本于2024年8月2日使用gpt - 40进行翻译。两套翻译均由3名独立的专业医学翻译人员进行评估。使用带有工程提示的gpt - 40翻译患者说明,并将这些翻译与专业翻译人员的翻译进行比较。主要结果和测量主要结果是翻译质量,使用多维质量度量(MQM)框架进行评估,以生成总体MQM评分(评分范围为0-100)。次要结果包括一般偏好等级和翻译错误类型的错误率。结果本研究纳入20份儿科患者说明书源文件。等效性测试显示gpt - 40和人工翻译之间的翻译质量没有显着差异,平均差异为1.6分(90% CI, 0.7-2.5),落在预定义的正负5个MQM点的等效范围内。法学硕士产生的误译错误较少,平均(SE) 52%(6%)的专业翻译更喜欢法学硕士的翻译。结论和相关性在这项横断面研究中,gpt - 40生成的儿科患者说明书的西班牙语翻译在质量上与使用标准化框架评估的专业人工翻译相当。虽然法学硕士翻译的人工审核在医疗保健中仍然是必不可少的,但这些发现表明,gpt - 40可以减少西班牙语的翻译工作量,潜在地释放资源,以支持传播较少的语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating a Large Language Model in Translating Patient Instructions to Spanish Using a Standardized Framework.
Importance Patients and caregivers who use languages other than English in the US encounter barriers to accessing language-concordant written instructions after clinical visits. Large language models (LLMs), such as OpenAI's GPT-4o, may improve access to translated patient materials; however, rigorous evaluation is needed to ensure clinical standards are met. Objective To determine whether GPT-4o can generate high-quality Spanish translations of personalized patient instructions comparable to those performed by professional human translators. Design, Setting, and Participants This cross-sectional study compared LLM translations to professional human translations using equivalence testing. The personalized pediatric instructions used were derived from real clinical encounters at a large US academic medical center and translated between January 2023 and December 2023. Patient instructions in English were translated into Spanish by GPT-4o and professional human translators. The source English texts were translated using GPT-4o on August 2, 2024. Both sets of translations were evaluated by 3 independent professional medical translators. Exposure Patient instructions were translated using GPT-4o with an engineered prompt, and these translations were compared with those produced by professional human translators. Main Outcomes and Measures The primary outcome was translation quality, assessed using the Multidimensional Quality Metrics (MQM) framework to generate an overall MQM score (rated on a 0-100 scale). Secondary outcomes included a general preference rating and error rates for types of translation errors. Results This study included 20 source files of pediatric patient instructions. Equivalence testing showed no significant difference in translation quality between GPT-4o and human translations, with a mean difference of 1.6 points (90% CI, 0.7-2.5), falling within a predefined equivalence margin of plus or minus 5 MQM points. The LLM yielded fewer mistranslation errors, and a mean (SE) of 52% (6%) of professional translator ratings preferred the LLM translations. Conclusions and Relevance In this cross-sectional study, GPT-4o generated Spanish translations of pediatric patient instructions that were comparable in quality to those by professional human translators as evaluated using a standardized framework. While human review of LLM translation remains essential in health care, these findings suggest that GPT-4o could reduce the translation workload for Spanish, potentially freeing resources to support languages of lesser diffusion.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JAMA Pediatrics
JAMA Pediatrics PEDIATRICS-
CiteScore
31.60
自引率
1.90%
发文量
357
期刊介绍: JAMA Pediatrics, the oldest continuously published pediatric journal in the US since 1911, is an international peer-reviewed publication and a part of the JAMA Network. Published weekly online and in 12 issues annually, it garners over 8.4 million article views and downloads yearly. All research articles become freely accessible online after 12 months without any author fees, and through the WHO's HINARI program, the online version is accessible to institutions in developing countries. With a focus on advancing the health of infants, children, and adolescents, JAMA Pediatrics serves as a platform for discussing crucial issues and policies in child and adolescent health care. Leveraging the latest technology, it ensures timely access to information for its readers worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信