美拉瑙语词形资源的自动获取

Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang
{"title":"美拉瑙语词形资源的自动获取","authors":"Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang","doi":"10.1109/IALP.2014.6973523","DOIUrl":null,"url":null,"abstract":"Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic acquisition of morphological resources for Melanau language\",\"authors\":\"Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang\",\"doi\":\"10.1109/IALP.2014.6973523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.\",\"PeriodicalId\":117334,\"journal\":{\"name\":\"2014 International Conference on Asian Language Processing (IALP)\",\"volume\":\"192 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Asian Language Processing (IALP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2014.6973523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

计算形态学资源是提供形态学信息以创建形态学分析器所必需的关键组成部分。要手动获取形态学资源,需要两个主要组成部分。预处理和词法归纳两个组成部分导致了两个问题:1)耗时;2)从资源不足的语言角度管理资源的模糊性。为了克服上述问题,我们提出了一种从人工方式扩展而来的形态资源自动获取工具。在这项工作中,提出的自动工具中的三个主要模块是:i)标记化-对原始文本进行标记并生成词表;ii)转换-将形态学资源的软拷贝转换为所需格式;iii)分词工具集成-集成两个已建立的分词工具,即Linguistica和Morfessor,从生成的词表中获取形态学信息。测试方法主要有组件测试和集成测试两种。结果表明,所提出的工具已经被设计出来,并证明了它的有效性,使语言学家能够轻松地获得他们的词表和分割数据。我们相信所提出的工具将有助于其他研究人员以自动化的方式为资源不足的语言构建计算形态资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic acquisition of morphological resources for Melanau language
Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信