Inherit or discard: learning better domain-specific child networks from the general domain for multi-domain NMT

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jinlei Xu, Yonghua Wen, Yan Xiang, Shuting Jiang, Yuxin Huang, Zhengtao Yu
{"title":"Inherit or discard: learning better domain-specific child networks from the general domain for multi-domain NMT","authors":"Jinlei Xu, Yonghua Wen, Yan Xiang, Shuting Jiang, Yuxin Huang, Zhengtao Yu","doi":"10.1007/s13042-024-02253-w","DOIUrl":null,"url":null,"abstract":"<p>Multi-domain NMT aims to develop a parameter-sharing model for translating general and specific domains, such as biology, legal, etc., which often struggle with the parameter interference problem. Existing approaches typically tackle this issue by learning a domain-specific sub-network for each domain equally, but they ignore the significant data imbalance problem across domains. For instance, the training data for the general domain often outweighs the biological domain tenfold. In this paper, we observe a natural similarity between the general and specific domains, including shared vocabulary or similar sentence structure. We propose a novel parameter inheritance strategy to adaptively learn domain-specific child networks from the general domain. Our approach employs gradient similarity as the criterion for determining which parameters should be inherited or discarded between the general and specific domains. Extensive experiments on several multi-domain NMT corpora demonstrate that our method significantly outperforms several strong baselines. In addition, our method exhibits remarkable generalization performance in adapting to few-shot multi-domain NMT scenarios. Further investigations reveal that our method achieves good interpretability because the parameters learned by the child network from the general domain depend on the interconnectedness between the specific domain and the general domain.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02253-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-domain NMT aims to develop a parameter-sharing model for translating general and specific domains, such as biology, legal, etc., which often struggle with the parameter interference problem. Existing approaches typically tackle this issue by learning a domain-specific sub-network for each domain equally, but they ignore the significant data imbalance problem across domains. For instance, the training data for the general domain often outweighs the biological domain tenfold. In this paper, we observe a natural similarity between the general and specific domains, including shared vocabulary or similar sentence structure. We propose a novel parameter inheritance strategy to adaptively learn domain-specific child networks from the general domain. Our approach employs gradient similarity as the criterion for determining which parameters should be inherited or discarded between the general and specific domains. Extensive experiments on several multi-domain NMT corpora demonstrate that our method significantly outperforms several strong baselines. In addition, our method exhibits remarkable generalization performance in adapting to few-shot multi-domain NMT scenarios. Further investigations reveal that our method achieves good interpretability because the parameters learned by the child network from the general domain depend on the interconnectedness between the specific domain and the general domain.

Abstract Image

继承还是放弃:从一般领域学习更好的特定领域子网络,实现多领域 NMT
多领域 NMT 旨在开发一种参数共享模型,用于翻译一般领域和特定领域,如生物、法律等领域,这些领域通常都存在参数干扰问题。现有方法通常通过为每个领域平等地学习特定领域的子网络来解决这一问题,但它们忽略了跨领域的严重数据不平衡问题。例如,普通领域的训练数据往往是生物领域的十倍。在本文中,我们观察到通用领域和特定领域之间存在天然的相似性,包括共享词汇或相似的句子结构。我们提出了一种新颖的参数继承策略,以便从一般领域自适应地学习特定领域的子网络。我们的方法采用梯度相似性作为标准,以确定哪些参数应在一般域和特定域之间继承或舍弃。在几个多域 NMT 体系上进行的广泛实验表明,我们的方法明显优于几个强大的基线方法。此外,我们的方法在适应少量多域 NMT 场景方面表现出了卓越的泛化性能。进一步的研究表明,我们的方法具有良好的可解释性,因为子网络从一般领域学习到的参数取决于特定领域和一般领域之间的相互关联性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Machine Learning and Cybernetics
International Journal of Machine Learning and Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
7.90
自引率
10.70%
发文量
225
期刊介绍: Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data. The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC. Key research areas to be covered by the journal include: Machine Learning for modeling interactions between systems Pattern Recognition technology to support discovery of system-environment interaction Control of system-environment interactions Biochemical interaction in biological and biologically-inspired systems Learning for improvement of communication schemes between systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信