Pannathorn Naksung, Chayaphat Nicrothanon, Putthichot Chunjiree, Thodsaporn Chay-intr, T. Theeramunkong
{"title":"A Construction of Hybrid Structural Thai Treebank","authors":"Pannathorn Naksung, Chayaphat Nicrothanon, Putthichot Chunjiree, Thodsaporn Chay-intr, T. Theeramunkong","doi":"10.1145/3342827.3342842","DOIUrl":null,"url":null,"abstract":"It is possible to include complicated structures into an individual syntactic tree, to enhance the usefulness of parsed text corpus. In this part, existing works on Thai treebank construction have been developed in order to address the lack of high-level syntactic resources. However, it has yet to be sufficient for Thai Natural Language Processing. Furthermore, Thai treebanks have either syntactic or dependency structure only. This paper presents a construction of hybrid structural Thai treebank which includes both syntactic/dependency structure, a tool for conversion between constituency and dependency parse tree, and a web-based GUI for parse tree visualization. Towards the hybrid treebank construction, hundreds of constituent tree are manually annotated with predicate header to each phrase. Once the set of annotated constituent trees are obtained, the conversion procedure will be performed by determining the annotated head and its dependents. As our experiments, features of hybrid treebank are extracted and illustrated. Finally, difficulties and issues in constructing the hybrid Thai treebank are discussed.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3342827.3342842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
It is possible to include complicated structures into an individual syntactic tree, to enhance the usefulness of parsed text corpus. In this part, existing works on Thai treebank construction have been developed in order to address the lack of high-level syntactic resources. However, it has yet to be sufficient for Thai Natural Language Processing. Furthermore, Thai treebanks have either syntactic or dependency structure only. This paper presents a construction of hybrid structural Thai treebank which includes both syntactic/dependency structure, a tool for conversion between constituency and dependency parse tree, and a web-based GUI for parse tree visualization. Towards the hybrid treebank construction, hundreds of constituent tree are manually annotated with predicate header to each phrase. Once the set of annotated constituent trees are obtained, the conversion procedure will be performed by determining the annotated head and its dependents. As our experiments, features of hybrid treebank are extracted and illustrated. Finally, difficulties and issues in constructing the hybrid Thai treebank are discussed.