基于相邻熵识别算法的中国英语网络新词库智能构建研究

2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE) Pub Date : 2022-10-28 DOI:10.1109/ECICE55674.2022.10042904

Y. Zu

{"title":"基于相邻熵识别算法的中国英语网络新词库智能构建研究","authors":"Y. Zu","doi":"10.1109/ECICE55674.2022.10042904","DOIUrl":null,"url":null,"abstract":"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.","PeriodicalId":282635,"journal":{"name":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm\",\"authors\":\"Y. Zu\",\"doi\":\"10.1109/ECICE55674.2022.10042904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.\",\"PeriodicalId\":282635,\"journal\":{\"name\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECICE55674.2022.10042904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECICE55674.2022.10042904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着网络的发展和普及，网络在人们的日常生活、工作以及社会热点信息的传播中发挥着重要的作用。网络汉英新词大多通过网络平台广泛传播，为人们所熟知和使用。新词识别在汉语分词和信息检索中起着重要的作用。随着中国英语中大量新词的出现，缺乏中国英语新词数据库已成为中国英语研究的一大障碍。新词识别是语料库建设中的一个重要技术问题。针对现有的点互信息中单词的内聚性和新单词识别算法邻接熵低的问题，提出了一种新的中国英语单词识别算法。该算法还解决了阈值无效短语点互信息阈值设置单一的问题，以及利用点互信息识别新词的新词组阈值较低的问题。实验结果表明，在相同的数据和实验环境下，该方法提高了语料库构建的正确率、查全率和F值，是有效可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm

With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)

自引率

0.00%

发文量