The Computational Method for Supporting Thai VerbNet Construction

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2023-12-26 DOI:10.1145/3638533

Krittanut Chungnoi, Rachada Kongkachandra, Sarun Gulyanon

{"title":"The Computational Method for Supporting Thai VerbNet Construction","authors":"Krittanut Chungnoi, Rachada Kongkachandra, Sarun Gulyanon","doi":"10.1145/3638533","DOIUrl":null,"url":null,"abstract":"<p>VerbNet is a lexical resource for verbs that has many applications in natural language processing tasks, especially ones that require information about both the syntactic behavior and the semantics of verbs. This paper presents an attempt to construct the first version of a Thai VerbNet corpus via data enrichment of the existing lexical resource. This corpus contains the annotation at both the syntactic and semantic levels, where verbs are tagged with frames within the verb class hierarchy and their arguments are labeled with the semantic role. We discuss the technical aspect of the construction process of Thai VerbNet and survey different semantic role labeling methods to make this process fully automatic. We also investigate the linguistic aspect of the computed verb classes and the results show the potential in assisting semantic classification and analysis. At the current stage, we have built the verb class hierarchy consisting of 28 verb classes from 112 unique concept frames over 490 unique verbs using our association rule learning method on Thai verbs.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"76 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3638533","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

VerbNet is a lexical resource for verbs that has many applications in natural language processing tasks, especially ones that require information about both the syntactic behavior and the semantics of verbs. This paper presents an attempt to construct the first version of a Thai VerbNet corpus via data enrichment of the existing lexical resource. This corpus contains the annotation at both the syntactic and semantic levels, where verbs are tagged with frames within the verb class hierarchy and their arguments are labeled with the semantic role. We discuss the technical aspect of the construction process of Thai VerbNet and survey different semantic role labeling methods to make this process fully automatic. We also investigate the linguistic aspect of the computed verb classes and the results show the potential in assisting semantic classification and analysis. At the current stage, we have built the verb class hierarchy consisting of 28 verb classes from 112 unique concept frames over 490 unique verbs using our association rule learning method on Thai verbs.

查看原文本刊更多论文

支持泰语动词网构建的计算方法

动词网（VerbNet）是一种动词词库，在自然语言处理任务中有着广泛的应用，尤其是那些需要动词的句法行为和语义信息的任务。本文介绍了通过对现有词法资源进行数据丰富来构建第一版泰语 VerbNet 语料库的尝试。该语料库包含句法和语义两个层面的注释，其中动词被标记为动词类层次结构中的框架，其参数被标记为语义角色。我们讨论了泰语动词网构建过程的技术方面，并研究了不同的语义角色标注方法，以使这一过程完全自动化。我们还对计算出的动词类进行了语言方面的研究，结果显示了其在辅助语义分类和分析方面的潜力。在现阶段，我们使用关联规则学习方法对泰语动词进行了学习，从 112 个独特的概念框架和 490 个独特的动词中建立了由 28 个动词类别组成的动词类别层次结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.