High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

Findings Pub Date : 2024-02-19 DOI:10.48550/arXiv.2402.12267
Michela Lorandi, Anya Belz
{"title":"High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models","authors":"Michela Lorandi, Anya Belz","doi":"10.48550/arXiv.2402.12267","DOIUrl":null,"url":null,"abstract":"The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages. We explore the extent to which pretrained large language models (LLMs) can bridge this gap, via the example of data-to-text generation for Irish, Welsh, Breton and Maltese. We test LLMs on these under-resourced languages and English, in a range of scenarios. We find that LLMs easily set the state of the art for the under-resourced languages by substantial margins, as measured by both automatic and human evaluations. For all our languages, human evaluation shows on-a-par performance with humans for our best systems, but BLEU scores collapse compared to English, casting doubt on the metric’s suitability for evaluating non-task-specific systems. Overall, our results demonstrate the great potential of LLMs to bridge the performance gap for under-resourced languages.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"86 3","pages":"1451-1461"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Findings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.12267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages. We explore the extent to which pretrained large language models (LLMs) can bridge this gap, via the example of data-to-text generation for Irish, Welsh, Breton and Maltese. We test LLMs on these under-resourced languages and English, in a range of scenarios. We find that LLMs easily set the state of the art for the under-resourced languages by substantial margins, as measured by both automatic and human evaluations. For all our languages, human evaluation shows on-a-par performance with humans for our best systems, but BLEU scores collapse compared to English, casting doubt on the metric’s suitability for evaluating non-task-specific systems. Overall, our results demonstrate the great potential of LLMs to bridge the performance gap for under-resourced languages.
利用开箱即用的大型语言模型为资源严重不足的语言生成高质量的数据到文本
目前,针对资源严重不足的语言的 NLP 方法的性能无法与针对资源充足的语言的 NLP 方法的技术水平相提并论。我们以爱尔兰语、威尔士语、布列塔尼语和马耳他语的数据到文本生成为例,探讨了经过预训练的大型语言模型(LLM)在多大程度上可以弥补这一差距。我们在一系列场景中对这些资源不足的语言和英语进行了 LLM 测试。我们发现,根据自动和人工评估结果,LLM 在资源不足的语言上很容易就达到了先进水平。对于我们的所有语言,人工评估显示我们的最佳系统与人类的表现相当,但 BLEU 分数与英语相比却有所下降,这让人怀疑该指标是否适合评估非特定任务系统。总之,我们的结果证明了 LLM 在缩小资源不足语言的性能差距方面具有巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信