Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages.

IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Radiology Pub Date : 2024-12-01 DOI:10.1148/radiol.241736
Aymen Meddeb, Sophia Lüken, Felix Busch, Lisa Adams, Lorenzo Ugga, Emmanouil Koltsakis, Antonios Tzortzakakis, Soumaya Jelassi, Insaf Dkhil, Michail E Klontzas, Matthaios Triantafyllou, Burak Kocak, Sabahattin Yüzkan, Longjiang Zhang, Bin Hu, Anna Andreychenko, Efimtcev Alexander Yurievich, Tatiana Logunova, Wipawee Morakote, Salita Angkurawaranon, Marcus R Makowski, Mike P Wattjes, Renato Cuocolo, Keno Bressem
{"title":"Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages.","authors":"Aymen Meddeb, Sophia Lüken, Felix Busch, Lisa Adams, Lorenzo Ugga, Emmanouil Koltsakis, Antonios Tzortzakakis, Soumaya Jelassi, Insaf Dkhil, Michail E Klontzas, Matthaios Triantafyllou, Burak Kocak, Sabahattin Yüzkan, Longjiang Zhang, Bin Hu, Anna Andreychenko, Efimtcev Alexander Yurievich, Tatiana Logunova, Wipawee Morakote, Salita Angkurawaranon, Marcus R Makowski, Mike P Wattjes, Renato Cuocolo, Keno Bressem","doi":"10.1148/radiol.241736","DOIUrl":null,"url":null,"abstract":"<p><p>Background High-quality translations of radiology reports are essential for optimal patient care. Because of limited availability of human translators with medical expertise, large language models (LLMs) are a promising solution, but their ability to translate radiology reports remains largely unexplored. Purpose To evaluate the accuracy and quality of various LLMs in translating radiology reports across high-resource languages (English, Italian, French, German, and Chinese) and low-resource languages (Swedish, Turkish, Russian, Greek, and Thai). Materials and Methods A dataset of 100 synthetic free-text radiology reports from CT and MRI scans was translated by 18 radiologists between January 14 and May 2, 2024, into nine target languages. Ten LLMs, including GPT-4 (OpenAI), Llama 3 (Meta), and Mixtral models (Mistral AI), were used for automated translation. Translation accuracy and quality were assessed with use of BiLingual Evaluation Understudy (BLEU) score, translation error rate (TER), and CHaRacter-level F-score (chrF++) metrics. Statistical significance was evaluated with use of paired <i>t</i> tests with Holm-Bonferroni corrections. Radiologists also conducted a qualitative evaluation of translations with use of a standardized questionnaire. Results GPT-4 demonstrated the best overall translation quality, particularly from English to German (BLEU score: 35.0 ± 16.3 [SD]; TER: 61.7 ± 21.2; chrF++: 70.6 ± 9.4), to Greek (BLEU: 32.6 ± 10.1; TER: 52.4 ± 10.6; chrF++: 62.8 ± 6.4), to Thai (BLEU: 53.2 ± 7.3; TER: 74.3 ± 5.2; chrF++: 48.4 ± 6.6), and to Turkish (BLEU: 35.5 ± 6.6; TER: 52.7 ± 7.4; chrF++: 70.7 ± 3.7). GPT-3.5 showed highest accuracy in translations from English to French, and Qwen1.5 excelled in English-to-Chinese translations, whereas Mixtral 8x22B performed best in Italian-to-English translations. The qualitative evaluation revealed that LLMs excelled in clarity, readability, and consistency with the original meaning but showed moderate medical terminology accuracy. Conclusion LLMs showed high accuracy and quality for translating radiology reports, although results varied by model and language pair. © RSNA, 2024 <i>Supplemental material is available for this article.</i></p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"313 3","pages":"e241736"},"PeriodicalIF":12.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.241736","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background High-quality translations of radiology reports are essential for optimal patient care. Because of limited availability of human translators with medical expertise, large language models (LLMs) are a promising solution, but their ability to translate radiology reports remains largely unexplored. Purpose To evaluate the accuracy and quality of various LLMs in translating radiology reports across high-resource languages (English, Italian, French, German, and Chinese) and low-resource languages (Swedish, Turkish, Russian, Greek, and Thai). Materials and Methods A dataset of 100 synthetic free-text radiology reports from CT and MRI scans was translated by 18 radiologists between January 14 and May 2, 2024, into nine target languages. Ten LLMs, including GPT-4 (OpenAI), Llama 3 (Meta), and Mixtral models (Mistral AI), were used for automated translation. Translation accuracy and quality were assessed with use of BiLingual Evaluation Understudy (BLEU) score, translation error rate (TER), and CHaRacter-level F-score (chrF++) metrics. Statistical significance was evaluated with use of paired t tests with Holm-Bonferroni corrections. Radiologists also conducted a qualitative evaluation of translations with use of a standardized questionnaire. Results GPT-4 demonstrated the best overall translation quality, particularly from English to German (BLEU score: 35.0 ± 16.3 [SD]; TER: 61.7 ± 21.2; chrF++: 70.6 ± 9.4), to Greek (BLEU: 32.6 ± 10.1; TER: 52.4 ± 10.6; chrF++: 62.8 ± 6.4), to Thai (BLEU: 53.2 ± 7.3; TER: 74.3 ± 5.2; chrF++: 48.4 ± 6.6), and to Turkish (BLEU: 35.5 ± 6.6; TER: 52.7 ± 7.4; chrF++: 70.7 ± 3.7). GPT-3.5 showed highest accuracy in translations from English to French, and Qwen1.5 excelled in English-to-Chinese translations, whereas Mixtral 8x22B performed best in Italian-to-English translations. The qualitative evaluation revealed that LLMs excelled in clarity, readability, and consistency with the original meaning but showed moderate medical terminology accuracy. Conclusion LLMs showed high accuracy and quality for translating radiology reports, although results varied by model and language pair. © RSNA, 2024 Supplemental material is available for this article.

将 CT 和 MRI 自由文本放射学报告翻译成多种语言的大语言模型能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Radiology
Radiology 医学-核医学
CiteScore
35.20
自引率
3.00%
发文量
596
审稿时长
3.6 months
期刊介绍: Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信