中古荷兰语的数据驱动音节化

Digital Medievalist Pub Date : 2019-11-04 DOI:10.16995/dm.83
Wouter Haverals, Folgert Karsdorp, M. Kestemont
{"title":"中古荷兰语的数据驱动音节化","authors":"Wouter Haverals, Folgert Karsdorp, M. Kestemont","doi":"10.16995/dm.83","DOIUrl":null,"url":null,"abstract":"The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.","PeriodicalId":440678,"journal":{"name":"Digital Medievalist","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data-Driven Syllabification for Middle Dutch\",\"authors\":\"Wouter Haverals, Folgert Karsdorp, M. Kestemont\",\"doi\":\"10.16995/dm.83\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.\",\"PeriodicalId\":440678,\"journal\":{\"name\":\"Digital Medievalist\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Medievalist\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.16995/dm.83\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Medievalist","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16995/dm.83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

自动将中古荷兰语单词分成音节是一项具有挑战性的任务。第一种方法是由Bouma和Hermans(2012)提出的,他们将基于规则的有限状态组件与数据驱动的纠错相结合。他们的系统平均单词准确率达到96.5%,虽然还有改进的空间,但确实令人满意。一般来说,基于规则的方法对于处理中世纪的语言不太有吸引力,比如中古荷兰语,在那里,不仅每种方言都有自己的拼写偏好,而且抄写员之间也有很多特殊的差异。本文提出了一种不依赖于预先定义的语言信息的中古荷兰语单词自动音节化方法。利用一种具有长短期记忆单元(LSTM)的递归神经网络(RNN),我们得到了一个在鲁棒性和工作量上都优于基于规则的方法的系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data-Driven Syllabification for Middle Dutch
The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信