基于相邻熵识别算法的中国英语网络新词库智能构建研究

Y. Zu
{"title":"基于相邻熵识别算法的中国英语网络新词库智能构建研究","authors":"Y. Zu","doi":"10.1109/ECICE55674.2022.10042904","DOIUrl":null,"url":null,"abstract":"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.","PeriodicalId":282635,"journal":{"name":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm\",\"authors\":\"Y. Zu\",\"doi\":\"10.1109/ECICE55674.2022.10042904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.\",\"PeriodicalId\":282635,\"journal\":{\"name\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECICE55674.2022.10042904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECICE55674.2022.10042904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着网络的发展和普及,网络在人们的日常生活、工作以及社会热点信息的传播中发挥着重要的作用。网络汉英新词大多通过网络平台广泛传播,为人们所熟知和使用。新词识别在汉语分词和信息检索中起着重要的作用。随着中国英语中大量新词的出现,缺乏中国英语新词数据库已成为中国英语研究的一大障碍。新词识别是语料库建设中的一个重要技术问题。针对现有的点互信息中单词的内聚性和新单词识别算法邻接熵低的问题,提出了一种新的中国英语单词识别算法。该算法还解决了阈值无效短语点互信息阈值设置单一的问题,以及利用点互信息识别新词的新词组阈值较低的问题。实验结果表明,在相同的数据和实验环境下,该方法提高了语料库构建的正确率、查全率和F值,是有效可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm
With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信