{"title":"一种基于语法树的频繁结构挖掘方案","authors":"Bo Chen, Weiming Peng, Jihua Song","doi":"10.59277/romjist.2023.1.01","DOIUrl":null,"url":null,"abstract":"\"Natural language processing (NLP) is one of the main research directions in artificial intelligence. One of the goals of NLP is to identify various semantic information in the text. Currently, the mainstream semantic recognition tasks focus more on using the semantic information of each word in the text to perform semantic analysis of the entire sentence. The research on semantics in cognitive linguistics indicates that semantics is determined by both the words contained in the sentence and the arrangement of the words. Linguists refer to permutations and combinations containing certain semantic information as constructions. Since the construction plays an essential role in semantic information, identifying various constructions in text is a crucial work of semantic recognition tasks. Based on this background, the main works performed in this paper are as follows: 1) The definition and program representation of constructions and the corresponding constraints in NLP tasks are proposed. 2) A frequent construction mining algorithm is proposed to extract frequent structures that meet the construction requirements in the grammar structure tree. Based on the above works, the corresponding construction database can be extracted for the specified natural language corpus, which is helpful for more effective text semantic analysis.\"","PeriodicalId":54448,"journal":{"name":"Romanian Journal of Information Science and Technology","volume":" ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Frequent Construction Mining Scheme Based on Syntax Tree\",\"authors\":\"Bo Chen, Weiming Peng, Jihua Song\",\"doi\":\"10.59277/romjist.2023.1.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\\"Natural language processing (NLP) is one of the main research directions in artificial intelligence. One of the goals of NLP is to identify various semantic information in the text. Currently, the mainstream semantic recognition tasks focus more on using the semantic information of each word in the text to perform semantic analysis of the entire sentence. The research on semantics in cognitive linguistics indicates that semantics is determined by both the words contained in the sentence and the arrangement of the words. Linguists refer to permutations and combinations containing certain semantic information as constructions. Since the construction plays an essential role in semantic information, identifying various constructions in text is a crucial work of semantic recognition tasks. Based on this background, the main works performed in this paper are as follows: 1) The definition and program representation of constructions and the corresponding constraints in NLP tasks are proposed. 2) A frequent construction mining algorithm is proposed to extract frequent structures that meet the construction requirements in the grammar structure tree. Based on the above works, the corresponding construction database can be extracted for the specified natural language corpus, which is helpful for more effective text semantic analysis.\\\"\",\"PeriodicalId\":54448,\"journal\":{\"name\":\"Romanian Journal of Information Science and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2023-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Romanian Journal of Information Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.59277/romjist.2023.1.01\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Romanian Journal of Information Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.59277/romjist.2023.1.01","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A Frequent Construction Mining Scheme Based on Syntax Tree
"Natural language processing (NLP) is one of the main research directions in artificial intelligence. One of the goals of NLP is to identify various semantic information in the text. Currently, the mainstream semantic recognition tasks focus more on using the semantic information of each word in the text to perform semantic analysis of the entire sentence. The research on semantics in cognitive linguistics indicates that semantics is determined by both the words contained in the sentence and the arrangement of the words. Linguists refer to permutations and combinations containing certain semantic information as constructions. Since the construction plays an essential role in semantic information, identifying various constructions in text is a crucial work of semantic recognition tasks. Based on this background, the main works performed in this paper are as follows: 1) The definition and program representation of constructions and the corresponding constraints in NLP tasks are proposed. 2) A frequent construction mining algorithm is proposed to extract frequent structures that meet the construction requirements in the grammar structure tree. Based on the above works, the corresponding construction database can be extracted for the specified natural language corpus, which is helpful for more effective text semantic analysis."
期刊介绍:
The primary objective of this journal is the publication of original results of research in information science and technology. There is no restriction on the addressed topics, the only acceptance criterion being the originality and quality of the articles, proved by independent reviewers. Contributions to recently emerging areas are encouraged.
Romanian Journal of Information Science and Technology (a publication of the Romanian Academy) is indexed and abstracted in the following Thomson Reuters products and information services:
• Science Citation Index Expanded (also known as SciSearch®),
• Journal Citation Reports/Science Edition.