Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning

Zetian Wu, Zhongkai Sun, Zhengyang Zhao, Sixing Lu, Chengyuan Ma, Chenlei Guo
{"title":"Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning","authors":"Zetian Wu, Zhongkai Sun, Zhengyang Zhao, Sixing Lu, Chengyuan Ma, Chenlei Guo","doi":"10.18653/v1/2022.mmnlu-1.2","DOIUrl":null,"url":null,"abstract":"Encoding both language-specific and language-agnostic information into a single high-dimensional space is a common practice of pre-trained Multi-lingual Language Models (pMLM). Such encoding has been shown to perform effectively on natural language tasks requiring semantics of the whole sentence (e.g., translation). However, its effectiveness appears to be limited on tasks requiring partial information of the utterance (e.g., multi-lingual entity retrieval, template retrieval, and semantic alignment). In this work, a novel Fine-grained Multilingual Disentangled Autoencoder (FMDA) is proposed to disentangle fine-grained semantic information from language-specific information in a multi-lingual setting. FMDA is capable of successfully extracting the disentangled template semantic and residual semantic representations. Experiments conducted on the MASSIVE dataset demonstrate that the disentangled encoding can boost each other during the training, thus consistently outperforming the original pMLM and the strong language disentanglement baseline on monolingual template retrieval and cross-lingual semantic retrieval tasks across multiple languages.","PeriodicalId":375461,"journal":{"name":"Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.mmnlu-1.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Encoding both language-specific and language-agnostic information into a single high-dimensional space is a common practice of pre-trained Multi-lingual Language Models (pMLM). Such encoding has been shown to perform effectively on natural language tasks requiring semantics of the whole sentence (e.g., translation). However, its effectiveness appears to be limited on tasks requiring partial information of the utterance (e.g., multi-lingual entity retrieval, template retrieval, and semantic alignment). In this work, a novel Fine-grained Multilingual Disentangled Autoencoder (FMDA) is proposed to disentangle fine-grained semantic information from language-specific information in a multi-lingual setting. FMDA is capable of successfully extracting the disentangled template semantic and residual semantic representations. Experiments conducted on the MASSIVE dataset demonstrate that the disentangled encoding can boost each other during the training, thus consistently outperforming the original pMLM and the strong language disentanglement baseline on monolingual template retrieval and cross-lingual semantic retrieval tasks across multiple languages.
用于语言不可知表示学习的细粒度多语言解纠缠自编码器
将特定于语言和与语言无关的信息编码到单个高维空间中是预训练的多语言语言模型(pMLM)的常见做法。这种编码已经被证明可以有效地执行需要整个句子语义的自然语言任务(例如,翻译)。然而,它的有效性似乎仅限于需要部分话语信息的任务(例如,多语言实体检索,模板检索和语义对齐)。在这项工作中,提出了一种新的细粒度多语言解纠缠自编码器(FMDA),用于在多语言设置中从特定语言信息中解纠缠细粒度语义信息。FMDA能够成功地提取解纠缠模板语义和残馀语义表示。在MASSIVE数据集上进行的实验表明,在训练过程中,解纠缠编码可以相互促进,从而在跨多语言的单语言模板检索和跨语言语义检索任务上始终优于原始pMLM和强语言解纠缠基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信