Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI:10.18653/v1/2020.sigmorphon-1.5

Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

引用次数: 14

Abstract

We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style. We apply this framework on two SIGMORPHON 2020 shared tasks: grapheme-to-phoneme conversion and morphological inflection. With very simple base models in the ensemble, we rank the first and the fourth in these two tasks. We show in the analysis that our system works especially well on low-resource languages.

查看原文本刊更多论文

低资源语言的集成自我训练:字素-音素转换和形态屈折

我们提出了一个迭代的数据增强框架，该框架训练和搜索最优集合，同时以自我训练的方式注释新的训练数据。我们将该框架应用于两个SIGMORPHON 2020共享任务:字素到音素转换和形态变形。由于集合中基础模型非常简单，我们在这两个任务中分别获得了第一名和第四名。我们在分析中表明，我们的系统在低资源语言上工作得特别好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Special Interest Group on Computational Morphology and Phonology Workshop

自引率

0.00%

发文量