A Frequent Construction Mining Scheme Based on Syntax Tree

IF 3.9 4区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Romanian Journal of Information Science and Technology Pub Date : 2023-03-24 DOI:10.59277/romjist.2023.1.01

Bo Chen, Weiming Peng, Jihua Song

{"title":"A Frequent Construction Mining Scheme Based on Syntax Tree","authors":"Bo Chen, Weiming Peng, Jihua Song","doi":"10.59277/romjist.2023.1.01","DOIUrl":null,"url":null,"abstract":"\"Natural language processing (NLP) is one of the main research directions in artificial intelligence. One of the goals of NLP is to identify various semantic information in the text. Currently, the mainstream semantic recognition tasks focus more on using the semantic information of each word in the text to perform semantic analysis of the entire sentence. The research on semantics in cognitive linguistics indicates that semantics is determined by both the words contained in the sentence and the arrangement of the words. Linguists refer to permutations and combinations containing certain semantic information as constructions. Since the construction plays an essential role in semantic information, identifying various constructions in text is a crucial work of semantic recognition tasks. Based on this background, the main works performed in this paper are as follows: 1) The definition and program representation of constructions and the corresponding constraints in NLP tasks are proposed. 2) A frequent construction mining algorithm is proposed to extract frequent structures that meet the construction requirements in the grammar structure tree. Based on the above works, the corresponding construction database can be extracted for the specified natural language corpus, which is helpful for more effective text semantic analysis.\"","PeriodicalId":54448,"journal":{"name":"Romanian Journal of Information Science and Technology","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Romanian Journal of Information Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.59277/romjist.2023.1.01","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

"Natural language processing (NLP) is one of the main research directions in artificial intelligence. One of the goals of NLP is to identify various semantic information in the text. Currently, the mainstream semantic recognition tasks focus more on using the semantic information of each word in the text to perform semantic analysis of the entire sentence. The research on semantics in cognitive linguistics indicates that semantics is determined by both the words contained in the sentence and the arrangement of the words. Linguists refer to permutations and combinations containing certain semantic information as constructions. Since the construction plays an essential role in semantic information, identifying various constructions in text is a crucial work of semantic recognition tasks. Based on this background, the main works performed in this paper are as follows: 1) The definition and program representation of constructions and the corresponding constraints in NLP tasks are proposed. 2) A frequent construction mining algorithm is proposed to extract frequent structures that meet the construction requirements in the grammar structure tree. Based on the above works, the corresponding construction database can be extracted for the specified natural language corpus, which is helpful for more effective text semantic analysis."

查看原文本刊更多论文

一种基于语法树的频繁结构挖掘方案

“自然语言处理（NLP）是人工智能的主要研究方向之一。NLP的目标之一是识别文本中的各种语义信息。目前，主流的语义识别任务更多地关注于利用文本中每个单词的语义信息对整个句子进行语义分析。认知语言学中对语义的研究表明，语义是由句子中的单词和单词的排列决定的。语言学家将包含某些语义信息的排列和组合称为结构。由于结构在语义信息中起着至关重要的作用，识别文本中的各种结构是语义识别任务的关键工作。基于这一背景，本文的主要工作如下：1）提出了NLP任务中构造的定义和程序表示以及相应的约束条件。2）提出了一种频繁构造挖掘算法来提取语法结构树中满足构造要求的频繁结构。基于上述工作，可以为指定的自然语言语料库提取相应的构造数据库，这有助于更有效的文本语义分析。“

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Romanian Journal of Information Science and Technology 工程技术-计算机：理论方法

CiteScore

5.50

自引率

8.60%

发文量

审稿时长

>12 weeks

期刊介绍： The primary objective of this journal is the publication of original results of research in information science and technology. There is no restriction on the addressed topics, the only acceptance criterion being the originality and quality of the articles, proved by independent reviewers. Contributions to recently emerging areas are encouraged. Romanian Journal of Information Science and Technology (a publication of the Romanian Academy) is indexed and abstracted in the following Thomson Reuters products and information services: • Science Citation Index Expanded (also known as SciSearch®), • Journal Citation Reports/Science Edition.