Quality of Machine Translations in Medical Texts: An Analysis Based on Standardised Evaluation Metrics.

Pascal Block, Johanna Schaefer, Felix Maurer, Holger Storf
{"title":"Quality of Machine Translations in Medical Texts: An Analysis Based on Standardised Evaluation Metrics.","authors":"Pascal Block, Johanna Schaefer, Felix Maurer, Holger Storf","doi":"10.3233/SHTI251380","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The medical care of patients with rare diseases is a cross-border concern across the EU. This is also reflected in the usage statistics of the SE-ATLAS, where most access occurs via browser languages set to German, English, French, or Polish. The SE-ATLAS website provides information on healthcare services and patient organisations for rare diseases in Germany. As SE-ATLAS currently offers its content almost exclusively in German, non-German-speaking users may encounter language barriers. Against this background, this paper explores whether common machine translation systems can translate medical texts into other languages at a reasonable level of quality.</p><p><strong>Methods: </strong>For this purpose, the translation systems DeepL, ChatGPT, and Google Translate were analysed. Translation quality was assessed using the standardised metrics BLEU, METEOR, and COMET. In contrast to subjective human assessments, these automated metrics allow for objective and reproducible evaluation. The analysis focused on machine-generated translations of German-language texts from the OPUS corpus into English, French, and Polish, each compared against existing reference translations.</p><p><strong>Results: </strong>BLEU scores were generally lower than those of the other metrics, whereas METEOR and COMET indicated moderate to high translation quality. Translations into English were consistently rated higher than those into French and Polish.</p><p><strong>Conclusion: </strong>As the three analysed translation systems showed hardly any statistically significant differences in translation quality and all delivered acceptable results, further criteria should be taken into account when choosing an appropriate system. These include factors such as data protection, cost-efficiency, and ease of integration.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"63-72"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The medical care of patients with rare diseases is a cross-border concern across the EU. This is also reflected in the usage statistics of the SE-ATLAS, where most access occurs via browser languages set to German, English, French, or Polish. The SE-ATLAS website provides information on healthcare services and patient organisations for rare diseases in Germany. As SE-ATLAS currently offers its content almost exclusively in German, non-German-speaking users may encounter language barriers. Against this background, this paper explores whether common machine translation systems can translate medical texts into other languages at a reasonable level of quality.

Methods: For this purpose, the translation systems DeepL, ChatGPT, and Google Translate were analysed. Translation quality was assessed using the standardised metrics BLEU, METEOR, and COMET. In contrast to subjective human assessments, these automated metrics allow for objective and reproducible evaluation. The analysis focused on machine-generated translations of German-language texts from the OPUS corpus into English, French, and Polish, each compared against existing reference translations.

Results: BLEU scores were generally lower than those of the other metrics, whereas METEOR and COMET indicated moderate to high translation quality. Translations into English were consistently rated higher than those into French and Polish.

Conclusion: As the three analysed translation systems showed hardly any statistically significant differences in translation quality and all delivered acceptable results, further criteria should be taken into account when choosing an appropriate system. These include factors such as data protection, cost-efficiency, and ease of integration.

基于标准化评价指标的医学文本机器翻译质量分析
简介:罕见疾病患者的医疗保健是整个欧盟的跨境关注。这也反映在SE-ATLAS的使用统计数据中,其中大多数访问是通过设置为德语、英语、法语或波兰语的浏览器语言进行的。SE-ATLAS网站提供有关德国罕见病医疗服务和患者组织的信息。由于SE-ATLAS目前几乎完全以德语提供内容,非德语用户可能会遇到语言障碍。在此背景下,本文探讨了常见的机器翻译系统能否在合理的质量水平上将医学文本翻译成其他语言。方法:为此,对DeepL、ChatGPT和谷歌翻译系统进行分析。使用标准化指标BLEU、METEOR和COMET评估翻译质量。与主观的人类评估相比,这些自动化的度量标准允许客观和可重复的评估。分析的重点是机器生成的OPUS语料库中的德语文本翻译成英语、法语和波兰语,并将每种翻译与现有的参考翻译进行比较。结果:BLEU评分普遍低于其他指标,而METEOR和COMET则表明翻译质量中等至较高。英语译本的评分一直高于法语和波兰语译本。结论:由于所分析的三种翻译系统在翻译质量上几乎没有统计学上的显著差异,并且都提供了可接受的结果,因此在选择合适的翻译系统时应考虑进一步的标准。这些因素包括数据保护、成本效率和易于集成等因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信