Inherit or discard: learning better domain-specific child networks from the general domain for multi-domain NMT

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics Pub Date : 2024-07-08 DOI:10.1007/s13042-024-02253-w

Jinlei Xu, Yonghua Wen, Yan Xiang, Shuting Jiang, Yuxin Huang, Zhengtao Yu

{"title":"Inherit or discard: learning better domain-specific child networks from the general domain for multi-domain NMT","authors":"Jinlei Xu, Yonghua Wen, Yan Xiang, Shuting Jiang, Yuxin Huang, Zhengtao Yu","doi":"10.1007/s13042-024-02253-w","DOIUrl":null,"url":null,"abstract":"<p>Multi-domain NMT aims to develop a parameter-sharing model for translating general and specific domains, such as biology, legal, etc., which often struggle with the parameter interference problem. Existing approaches typically tackle this issue by learning a domain-specific sub-network for each domain equally, but they ignore the significant data imbalance problem across domains. For instance, the training data for the general domain often outweighs the biological domain tenfold. In this paper, we observe a natural similarity between the general and specific domains, including shared vocabulary or similar sentence structure. We propose a novel parameter inheritance strategy to adaptively learn domain-specific child networks from the general domain. Our approach employs gradient similarity as the criterion for determining which parameters should be inherited or discarded between the general and specific domains. Extensive experiments on several multi-domain NMT corpora demonstrate that our method significantly outperforms several strong baselines. In addition, our method exhibits remarkable generalization performance in adapting to few-shot multi-domain NMT scenarios. Further investigations reveal that our method achieves good interpretability because the parameters learned by the child network from the general domain depend on the interconnectedness between the specific domain and the general domain.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"1 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02253-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-domain NMT aims to develop a parameter-sharing model for translating general and specific domains, such as biology, legal, etc., which often struggle with the parameter interference problem. Existing approaches typically tackle this issue by learning a domain-specific sub-network for each domain equally, but they ignore the significant data imbalance problem across domains. For instance, the training data for the general domain often outweighs the biological domain tenfold. In this paper, we observe a natural similarity between the general and specific domains, including shared vocabulary or similar sentence structure. We propose a novel parameter inheritance strategy to adaptively learn domain-specific child networks from the general domain. Our approach employs gradient similarity as the criterion for determining which parameters should be inherited or discarded between the general and specific domains. Extensive experiments on several multi-domain NMT corpora demonstrate that our method significantly outperforms several strong baselines. In addition, our method exhibits remarkable generalization performance in adapting to few-shot multi-domain NMT scenarios. Further investigations reveal that our method achieves good interpretability because the parameters learned by the child network from the general domain depend on the interconnectedness between the specific domain and the general domain.

Abstract Image

查看原文本刊更多论文

继承还是放弃：从一般领域学习更好的特定领域子网络，实现多领域 NMT

多领域 NMT 旨在开发一种参数共享模型，用于翻译一般领域和特定领域，如生物、法律等领域，这些领域通常都存在参数干扰问题。现有方法通常通过为每个领域平等地学习特定领域的子网络来解决这一问题，但它们忽略了跨领域的严重数据不平衡问题。例如，普通领域的训练数据往往是生物领域的十倍。在本文中，我们观察到通用领域和特定领域之间存在天然的相似性，包括共享词汇或相似的句子结构。我们提出了一种新颖的参数继承策略，以便从一般领域自适应地学习特定领域的子网络。我们的方法采用梯度相似性作为标准，以确定哪些参数应在一般域和特定域之间继承或舍弃。在几个多域 NMT 体系上进行的广泛实验表明，我们的方法明显优于几个强大的基线方法。此外，我们的方法在适应少量多域 NMT 场景方面表现出了卓越的泛化性能。进一步的研究表明，我们的方法具有良好的可解释性，因为子网络从一般领域学习到的参数取决于特定领域和一般领域之间的相互关联性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Machine Learning and Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

7.90

自引率

10.70%

发文量

225

期刊介绍： Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data. The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC. Key research areas to be covered by the journal include: Machine Learning for modeling interactions between systems Pattern Recognition technology to support discovery of system-environment interaction Control of system-environment interactions Biochemical interaction in biological and biologically-inspired systems Learning for improvement of communication schemes between systems