Spam filtering by semantics-based text classification

Wei Hu, Jinglong Du, Yongkang Xing
{"title":"Spam filtering by semantics-based text classification","authors":"Wei Hu, Jinglong Du, Yongkang Xing","doi":"10.1109/ICACI.2016.7449809","DOIUrl":null,"url":null,"abstract":"Spam has been a serious and annoying problem for decades. Even though plenty of solutions have been put forward, there still remains a lot to be promoted in filtering spam emails more efficiently. Nowadays a major problem in spam filtering as well as text classification in natural language processing is the huge size of vector space due to the numerous feature terms, which is usually the cause of extensive calculation and slow classification. Extracting semantic meanings from the content of texts and using these as feature terms to build up the vector space, instead of using words as feature terms in tradition ways, could reduce the dimension of vectors effectively and promote the classification at the same time. In this paper, a novel Chinese spam filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content. Both the extraction of semantic meanings and the selection of feature terms are implemented through attaching annotations on the texts layer-by-layer. This filter performed well when experimented on a public Chinese spam corpus.","PeriodicalId":211040,"journal":{"name":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2016.7449809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Spam has been a serious and annoying problem for decades. Even though plenty of solutions have been put forward, there still remains a lot to be promoted in filtering spam emails more efficiently. Nowadays a major problem in spam filtering as well as text classification in natural language processing is the huge size of vector space due to the numerous feature terms, which is usually the cause of extensive calculation and slow classification. Extracting semantic meanings from the content of texts and using these as feature terms to build up the vector space, instead of using words as feature terms in tradition ways, could reduce the dimension of vectors effectively and promote the classification at the same time. In this paper, a novel Chinese spam filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content. Both the extraction of semantic meanings and the selection of feature terms are implemented through attaching annotations on the texts layer-by-layer. This filter performed well when experimented on a public Chinese spam corpus.
基于语义的文本分类垃圾邮件过滤
几十年来,垃圾邮件一直是一个严重而恼人的问题。尽管已经提出了许多解决方案,但在更有效地过滤垃圾邮件方面仍有很多需要改进的地方。目前,垃圾邮件过滤和自然语言处理中的文本分类面临的一个主要问题是,由于特征项众多,向量空间的规模很大,这通常会导致大量的计算和缓慢的分类。从文本内容中提取语义并将其作为特征项来构建向量空间,取代传统的以单词作为特征项的方法,可以有效地降低向量的维数,同时提高分类效率。本文提出了一种基于语义的文本分类技术的中文垃圾邮件过滤方法,并从文本内容的语义中选择相关的特征词。语义的提取和特征项的选择都是通过逐层标注实现的。在公开的中文垃圾邮件语料库上进行了实验,取得了良好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信