Glossy Bytes: Neural Glossing using Subword Encoding

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI:10.18653/v1/2023.sigmorphon-1.24

Ziggy Cross, Michelle Yun, Ananya Apparaju, Jata MacCabe, Garrett Nicolai, Miikka Silfverberg

引用次数: 1

Abstract

This paper presents several different neural subword modelling based approaches to interlinear glossing for seven under-resourced languages as a part of the 2023 SIGMORPHON shared task on interlinear glossing. We experiment with various augmentation and tokenization strategies for both the open and closed tracks of data. We found that while byte-level models may perform well for greater amounts of data, character based approaches remain competitive in their performance in lower resource settings.

查看原文本刊更多论文

平滑字节:使用子字编码的神经光泽

本文提出了几种不同的基于神经子词建模的方法，用于七种资源不足的语言的行间光泽，作为2023 SIGMORPHON关于行间光泽的共享任务的一部分。我们对数据的开放和封闭轨迹进行了各种增强和标记化策略的实验。我们发现，虽然字节级模型可能在更大的数据量下表现良好，但基于字符的方法在低资源设置下的性能仍然具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Special Interest Group on Computational Morphology and Phonology Workshop

自引率

0.00%

发文量