多语言模式对新语言的系统发育启发适应

Q3 Environmental Science

AACL Bioflux Pub Date : 2022-05-19 DOI:10.48550/arXiv.2205.09634

FAHIM FAISAL, Antonios Anastasopoulos

{"title":"多语言模式对新语言的系统发育启发适应","authors":"FAHIM FAISAL, Antonios Anastasopoulos","doi":"10.48550/arXiv.2205.09634","DOIUrl":null,"url":null,"abstract":"Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning capabilities on a variety of language tasks. Further adapting these models to specific languages, especially ones unseen during pre-training, is an important goal toward expanding the coverage of language technologies. In this study, we show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages in a structured, linguistically-informed manner. We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks, obtaining more than 20% relative performance improvements over strong commonly used baselines, especially on languages unseen during pre-training.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"74 1","pages":"434-452"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Phylogeny-Inspired Adaptation of Multilingual Models to New Languages\",\"authors\":\"FAHIM FAISAL, Antonios Anastasopoulos\",\"doi\":\"10.48550/arXiv.2205.09634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning capabilities on a variety of language tasks. Further adapting these models to specific languages, especially ones unseen during pre-training, is an important goal toward expanding the coverage of language technologies. In this study, we show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages in a structured, linguistically-informed manner. We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks, obtaining more than 20% relative performance improvements over strong commonly used baselines, especially on languages unseen during pre-training.\",\"PeriodicalId\":39298,\"journal\":{\"name\":\"AACL Bioflux\",\"volume\":\"74 1\",\"pages\":\"434-452\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AACL Bioflux\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.09634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AACL Bioflux","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.09634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Environmental Science","Score":null,"Total":0}

引用次数: 16

摘要

大型预训练的多语言模型，经过数十种语言的训练，由于在各种语言任务上的跨语言学习能力，已经取得了很好的结果。进一步使这些模型适应特定的语言，特别是那些在预训练期间未见过的语言，是扩大语言技术覆盖范围的一个重要目标。在这项研究中，我们展示了如何利用语言系统发育信息，以结构化的、语言知情的方式利用密切相关的语言来改善跨语言迁移。我们对来自不同语系的语言(日耳曼语、乌拉尔语、图pian、乌托-阿兹特克语)进行了基于适配器的训练，并对句法和语义任务进行了评估，在强大的常用基线上获得了超过20%的相对性能提升，特别是在预训练期间未见过的语言上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning capabilities on a variety of language tasks. Further adapting these models to specific languages, especially ones unseen during pre-training, is an important goal toward expanding the coverage of language technologies. In this study, we show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages in a structured, linguistically-informed manner. We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks, obtaining more than 20% relative performance improvements over strong commonly used baselines, especially on languages unseen during pre-training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AACL Bioflux Environmental Science-Management, Monitoring, Policy and Law

CiteScore

1.40

自引率

0.00%

发文量