{"title":"基于级联计算图的多语言神经机器翻译","authors":"Abouzar Qorbani , Reza Ramezani , Ahmad Baraani , Arefeh Kazemi","doi":"10.1016/j.eswa.2025.128722","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of artificial intelligence, multilingual models have become increasingly vital in machine translation tasks. However, Multilingual Neural Machine Translation (MNMT) faces persistent challenges, notably reduced translation quality and language interference. When training on diverse language pairs, the translation performance for certain languages may degrade due to negative transfer effects. To address this problem, researchers have proposed various strategies such as parameter sharing, partial sharing, and language-specific parameterization. Despite these efforts, limitations remain—including high data requirements, reliance on linguistic relatedness, inflexibility in model architecture adaptation during training, and negative inference (producing output in an unintended language). The identification and targeted modification of effective and ineffective nodes within a neural model can effectively enhance the translation performance, particularly for low-resource and extremely low-resource languages. In this paper, a novel method is proposed in this study that identifies ineffective nodes in an MNMT model and corrects them by twinning with effective counterparts. This is achieved through computational graph grouping based on semantic similarity. The proposed method has been evaluated on several multilingual datasets, including TED2013, TED2020, and BIBLE. Relative to baseline models, the proposed method demonstrates notable improvements in BLEU scores—achieving relative gains of 23.7 % on TED2013, 7.06 % on TED2020, and 16.9 % on BIBLE. It also outperforms large-scale systems such as ChatGPT, Bing GPT-4, and Google Neural Machine Translation (GNMT) across all evaluated datasets. Furthermore, the performance has been assessed on the extremely low-resource language pair English–Igbo using the OPUS-100 dataset. The results show that the proposed method outperforms baseline models by 2.58 %, while the large-scale Madlad400-3B model, despite its depth (32 layers, 450 languages), struggles in this setting. Similarly, the Semlin-MNMT model performs well for high-resource pairs but shows significant degradation on low-resource languages. Overall, our proposed method provides a robust and scalable approach for enhancing MNMT quality in both one-to-many and many-to-many translation scenarios. Its effectiveness in low-resource and extremely low-resource settings highlights its practical value and contribution to the advancement of multilingual translation systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"294 ","pages":"Article 128722"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multilingual neural machine translation by cascading computational graphs\",\"authors\":\"Abouzar Qorbani , Reza Ramezani , Ahmad Baraani , Arefeh Kazemi\",\"doi\":\"10.1016/j.eswa.2025.128722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of artificial intelligence, multilingual models have become increasingly vital in machine translation tasks. However, Multilingual Neural Machine Translation (MNMT) faces persistent challenges, notably reduced translation quality and language interference. When training on diverse language pairs, the translation performance for certain languages may degrade due to negative transfer effects. To address this problem, researchers have proposed various strategies such as parameter sharing, partial sharing, and language-specific parameterization. Despite these efforts, limitations remain—including high data requirements, reliance on linguistic relatedness, inflexibility in model architecture adaptation during training, and negative inference (producing output in an unintended language). The identification and targeted modification of effective and ineffective nodes within a neural model can effectively enhance the translation performance, particularly for low-resource and extremely low-resource languages. In this paper, a novel method is proposed in this study that identifies ineffective nodes in an MNMT model and corrects them by twinning with effective counterparts. This is achieved through computational graph grouping based on semantic similarity. The proposed method has been evaluated on several multilingual datasets, including TED2013, TED2020, and BIBLE. Relative to baseline models, the proposed method demonstrates notable improvements in BLEU scores—achieving relative gains of 23.7 % on TED2013, 7.06 % on TED2020, and 16.9 % on BIBLE. It also outperforms large-scale systems such as ChatGPT, Bing GPT-4, and Google Neural Machine Translation (GNMT) across all evaluated datasets. Furthermore, the performance has been assessed on the extremely low-resource language pair English–Igbo using the OPUS-100 dataset. The results show that the proposed method outperforms baseline models by 2.58 %, while the large-scale Madlad400-3B model, despite its depth (32 layers, 450 languages), struggles in this setting. Similarly, the Semlin-MNMT model performs well for high-resource pairs but shows significant degradation on low-resource languages. Overall, our proposed method provides a robust and scalable approach for enhancing MNMT quality in both one-to-many and many-to-many translation scenarios. Its effectiveness in low-resource and extremely low-resource settings highlights its practical value and contribution to the advancement of multilingual translation systems.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"294 \",\"pages\":\"Article 128722\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425023401\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425023401","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Multilingual neural machine translation by cascading computational graphs
In the era of artificial intelligence, multilingual models have become increasingly vital in machine translation tasks. However, Multilingual Neural Machine Translation (MNMT) faces persistent challenges, notably reduced translation quality and language interference. When training on diverse language pairs, the translation performance for certain languages may degrade due to negative transfer effects. To address this problem, researchers have proposed various strategies such as parameter sharing, partial sharing, and language-specific parameterization. Despite these efforts, limitations remain—including high data requirements, reliance on linguistic relatedness, inflexibility in model architecture adaptation during training, and negative inference (producing output in an unintended language). The identification and targeted modification of effective and ineffective nodes within a neural model can effectively enhance the translation performance, particularly for low-resource and extremely low-resource languages. In this paper, a novel method is proposed in this study that identifies ineffective nodes in an MNMT model and corrects them by twinning with effective counterparts. This is achieved through computational graph grouping based on semantic similarity. The proposed method has been evaluated on several multilingual datasets, including TED2013, TED2020, and BIBLE. Relative to baseline models, the proposed method demonstrates notable improvements in BLEU scores—achieving relative gains of 23.7 % on TED2013, 7.06 % on TED2020, and 16.9 % on BIBLE. It also outperforms large-scale systems such as ChatGPT, Bing GPT-4, and Google Neural Machine Translation (GNMT) across all evaluated datasets. Furthermore, the performance has been assessed on the extremely low-resource language pair English–Igbo using the OPUS-100 dataset. The results show that the proposed method outperforms baseline models by 2.58 %, while the large-scale Madlad400-3B model, despite its depth (32 layers, 450 languages), struggles in this setting. Similarly, the Semlin-MNMT model performs well for high-resource pairs but shows significant degradation on low-resource languages. Overall, our proposed method provides a robust and scalable approach for enhancing MNMT quality in both one-to-many and many-to-many translation scenarios. Its effectiveness in low-resource and extremely low-resource settings highlights its practical value and contribution to the advancement of multilingual translation systems.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.