Jorge Iranzo-Sánchez, Jaume Santamaría-Jordà, Gerard Mas-Mollà, Gonçal V. Garcés Díaz-Munío, Javier Iranzo-Sánchez, Javier Jorge, Joan Albert Silvestre-Cerdà, Adrià Giménez, Jorge Civera, Albert Sanchis, Alfons Juan
{"title":"Speech translation for multilingual medical education leveraged by large language models","authors":"Jorge Iranzo-Sánchez, Jaume Santamaría-Jordà, Gerard Mas-Mollà, Gonçal V. Garcés Díaz-Munío, Javier Iranzo-Sánchez, Javier Jorge, Joan Albert Silvestre-Cerdà, Adrià Giménez, Jorge Civera, Albert Sanchis, Alfons Juan","doi":"10.1016/j.artmed.2025.103147","DOIUrl":null,"url":null,"abstract":"<div><div>The application of large language models (LLMs) to speech translation (ST) or, in general, to machine translation (MT) has recently provided excellent results, superseding conventional encoder–decoder MT systems in the general domain. However, this is not clearly the case when LLMs as MT systems are translating medical-related materials. In this respect, the provision of multilingual training materials for oncology professionals is a goal of the EU project Interact-Europe in which this work was framed. To this end, cross-language technology adapted to the oncology domain was developed, evaluated and deployed for multilingual interspecialty medical education. More precisely, automatic speech recognition (ASR) and MT models were adapted to the oncology domain to translate English pre-recorded training videos, kindly provided by the European School of Oncology (ESO), into French, Spanish, German and Slovene. In this work, three categories of MT models adapted to the medical domain were assessed: bilingual encoder–decoder MT models trained from scratch, pre-trained large multilingual encoder–decoder MT models, and multilingual decoder-only LLMs. The experimental results underline the competitiveness in translation quality of LLMs compared to encoder–decoder MT models. Finally, the ESO speech dataset, comprising roughly 1000 videos and 745 h for the training and evaluation of ASR, MT and ST models, was publicly released for the scientific community.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"166 ","pages":"Article 103147"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S093336572500082X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The application of large language models (LLMs) to speech translation (ST) or, in general, to machine translation (MT) has recently provided excellent results, superseding conventional encoder–decoder MT systems in the general domain. However, this is not clearly the case when LLMs as MT systems are translating medical-related materials. In this respect, the provision of multilingual training materials for oncology professionals is a goal of the EU project Interact-Europe in which this work was framed. To this end, cross-language technology adapted to the oncology domain was developed, evaluated and deployed for multilingual interspecialty medical education. More precisely, automatic speech recognition (ASR) and MT models were adapted to the oncology domain to translate English pre-recorded training videos, kindly provided by the European School of Oncology (ESO), into French, Spanish, German and Slovene. In this work, three categories of MT models adapted to the medical domain were assessed: bilingual encoder–decoder MT models trained from scratch, pre-trained large multilingual encoder–decoder MT models, and multilingual decoder-only LLMs. The experimental results underline the competitiveness in translation quality of LLMs compared to encoder–decoder MT models. Finally, the ESO speech dataset, comprising roughly 1000 videos and 745 h for the training and evaluation of ASR, MT and ST models, was publicly released for the scientific community.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.