Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages

Pierre Godard, L. Besacier, François Yvon, M. Adda-Decker, G. Adda, Hélène Maynard, Annie Rialland
{"title":"Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages","authors":"Pierre Godard, L. Besacier, François Yvon, M. Adda-Decker, G. Adda, Hélène Maynard, Annie Rialland","doi":"10.18653/v1/W18-5804","DOIUrl":null,"url":null,"abstract":"Computational Language Documentation attempts to make the most recent research in speech and language technologies available to linguists working on language preservation and documentation. In this paper, we pursue two main goals along these lines. The first is to improve upon a strong baseline for the unsupervised word discovery task on two very low-resource Bantu languages, taking advantage of the expertise of linguists on these particular languages. The second consists in exploring the Adaptor Grammar framework as a decision and prediction tool for linguists studying a new language. We experiment 162 grammar configurations for each language and show that using Adaptor Grammars for word segmentation enables us to test hypotheses about a language. Specializing a generic grammar with language specific knowledge leads to great improvements for the word discovery task, ultimately achieving a leap of about 30% token F-score from the results of a strong baseline.","PeriodicalId":415625,"journal":{"name":"Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W18-5804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Computational Language Documentation attempts to make the most recent research in speech and language technologies available to linguists working on language preservation and documentation. In this paper, we pursue two main goals along these lines. The first is to improve upon a strong baseline for the unsupervised word discovery task on two very low-resource Bantu languages, taking advantage of the expertise of linguists on these particular languages. The second consists in exploring the Adaptor Grammar framework as a decision and prediction tool for linguists studying a new language. We experiment 162 grammar configurations for each language and show that using Adaptor Grammars for word segmentation enables us to test hypotheses about a language. Specializing a generic grammar with language specific knowledge leads to great improvements for the word discovery task, ultimately achieving a leap of about 30% token F-score from the results of a strong baseline.
语言学家的适配文法:非常低资源语言的分词实验
计算语言文档试图使语音和语言技术的最新研究可用于语言保存和文档的语言学家。在本文中,我们沿着这条路线追求两个主要目标。首先是利用语言学家在两种资源非常低的班图语上的专业知识,改进无监督词发现任务的强大基线。第二部分是探索适应语法框架作为语言学家研究新语言的决策和预测工具。我们为每种语言测试了162种语法配置,并表明使用Adaptor Grammars进行分词使我们能够测试关于语言的假设。使用特定于语言的知识专门化通用语法可以极大地改进单词发现任务,最终实现比强基线结果高出约30%的标记f分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信