Research of machine learning method for specific information recognition on the Internet

Dequan Zheng, Y. Hu, T. Zhao, Hao Yu, Sheng Li
{"title":"Research of machine learning method for specific information recognition on the Internet","authors":"Dequan Zheng, Y. Hu, T. Zhao, Hao Yu, Sheng Li","doi":"10.1109/ICMI.2002.1166998","DOIUrl":null,"url":null,"abstract":"With the available resources on the Internet becoming plentiful, a large amount of harmful information is permeating in and has been seriously affecting people's normal work and living. Therefore, harmful data streams must be recognized and filtered out effectively. After analyzing some harmful contents in Internet information streams, we present a new method, which recognizes specific information by machine learning (ML). We extracted key information from a number of corpuses through the ML method to obtain the part of speech (POS) transfer-form for key information by learning from corpuses, which is based on the same pronunciation matching of key information. Furthermore, the testing value of key information will be obtained in a real corpus to examine the likelihood between matching rules from information streams and those learnt from corpuses through the average value of POS transfer probability of key information. Therefore, the testing value for the whole real data stream will be obtained The experiment proved that the method was efficient for recognizing certain Internet harmful information.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMI.2002.1166998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

With the available resources on the Internet becoming plentiful, a large amount of harmful information is permeating in and has been seriously affecting people's normal work and living. Therefore, harmful data streams must be recognized and filtered out effectively. After analyzing some harmful contents in Internet information streams, we present a new method, which recognizes specific information by machine learning (ML). We extracted key information from a number of corpuses through the ML method to obtain the part of speech (POS) transfer-form for key information by learning from corpuses, which is based on the same pronunciation matching of key information. Furthermore, the testing value of key information will be obtained in a real corpus to examine the likelihood between matching rules from information streams and those learnt from corpuses through the average value of POS transfer probability of key information. Therefore, the testing value for the whole real data stream will be obtained The experiment proved that the method was efficient for recognizing certain Internet harmful information.
互联网特定信息识别的机器学习方法研究
随着网络资源的日益丰富,大量有害信息不断渗透进来,严重影响着人们的正常工作和生活。因此,必须有效地识别和过滤有害数据流。在分析了互联网信息流中一些有害内容的基础上,提出了一种利用机器学习识别特定信息的新方法。我们通过ML方法从多个语料库中提取关键信息,通过对语料库的学习获得关键信息的词性转换,该转换基于关键信息的同音匹配。然后在真实语料库中得到关键信息的测试值,通过关键信息的POS传递概率的平均值来检验信息流中的匹配规则与语料库中学习到的匹配规则之间的似然度。实验证明,该方法对某些网络有害信息的识别是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信