{"title":"基于相邻熵识别算法的中国英语网络新词库智能构建研究","authors":"Y. Zu","doi":"10.1109/ECICE55674.2022.10042904","DOIUrl":null,"url":null,"abstract":"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.","PeriodicalId":282635,"journal":{"name":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm\",\"authors\":\"Y. Zu\",\"doi\":\"10.1109/ECICE55674.2022.10042904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.\",\"PeriodicalId\":282635,\"journal\":{\"name\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECICE55674.2022.10042904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECICE55674.2022.10042904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm
With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.