CF Planter: A Toolset for Semi-automatic Thai Treebank Construction

Pechlada Seenual, Thodsaporn Chay-intr, T. Theeramunkong
{"title":"CF Planter: A Toolset for Semi-automatic Thai Treebank Construction","authors":"Pechlada Seenual, Thodsaporn Chay-intr, T. Theeramunkong","doi":"10.1109/ICESIT-ICICTES.2018.8442061","DOIUrl":null,"url":null,"abstract":"To fasten treebank construction, it is necessary to design an integrated annotation tool that includes word segmenter, sentence parser for initial tree suggestion, tree visualizer, tree-structure editor, and collaborative functions. In the past, existing tools did not consider an integrated platform that provides preprocessing, automated or semi-automated mechanism for parse tree suggestion, as well as tagged corpus data management. This paper presents a so-called CF Planter, a toolset for semi-automatic Thai treebank construction that consist of word segmenter, part-of-speech tagger, statistical parser, a web-based GUI for syntactic tree refinement and management. Given an input sentence, its most likely syntactic tree is automatically suggested and visualized to an annotator for manual correction before adding into the treebank repository. Whenever a new syntactic tree is appended into the treebank, the treebank repository is iteratively refined by computing a set of newly revised grammar rules based on revised probabilities. Toolset is performed to severally illustrate with grammar frequencies. The toolset facilitates annotators to easily tag tree structure for an input sentence. Finally, the process of automatic suggestion of syntactic tree is evaluated.","PeriodicalId":57136,"journal":{"name":"单片机与嵌入式系统应用","volume":"9 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"单片机与嵌入式系统应用","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/ICESIT-ICICTES.2018.8442061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

To fasten treebank construction, it is necessary to design an integrated annotation tool that includes word segmenter, sentence parser for initial tree suggestion, tree visualizer, tree-structure editor, and collaborative functions. In the past, existing tools did not consider an integrated platform that provides preprocessing, automated or semi-automated mechanism for parse tree suggestion, as well as tagged corpus data management. This paper presents a so-called CF Planter, a toolset for semi-automatic Thai treebank construction that consist of word segmenter, part-of-speech tagger, statistical parser, a web-based GUI for syntactic tree refinement and management. Given an input sentence, its most likely syntactic tree is automatically suggested and visualized to an annotator for manual correction before adding into the treebank repository. Whenever a new syntactic tree is appended into the treebank, the treebank repository is iteratively refined by computing a set of newly revised grammar rules based on revised probabilities. Toolset is performed to severally illustrate with grammar frequencies. The toolset facilitates annotators to easily tag tree structure for an input sentence. Finally, the process of automatic suggestion of syntactic tree is evaluated.
CF Planter:一个半自动泰国树库构建工具集
为了加快树库的建设,有必要设计一个集成的标注工具,包括分词器、初始树建议句子解析器、树可视化器、树结构编辑器和协同功能。在过去,现有的工具并没有考虑一个集成的平台,提供预处理,自动化或半自动机制的解析树建议,以及标记语料库数据管理。本文提出了一个所谓的CF Planter,一个半自动泰语树库构建工具集,它由分词器、词性标注器、统计解析器、基于web的语法树细化和管理GUI组成。给定一个输入句子,在将其添加到树库存储库之前,它最可能的语法树将被自动建议并可视化给注释器进行手动更正。每当向树库中添加新的语法树时,树库存储库就会根据修改的概率计算一组新修改的语法规则,从而迭代地改进。工具集的执行是为了用语法频率分别说明。该工具集便于注释者轻松标记输入句子的树结构。最后,对句法树的自动提示过程进行了评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
7395
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信