{"title":"Spam filtering by semantics-based text classification","authors":"Wei Hu, Jinglong Du, Yongkang Xing","doi":"10.1109/ICACI.2016.7449809","DOIUrl":null,"url":null,"abstract":"Spam has been a serious and annoying problem for decades. Even though plenty of solutions have been put forward, there still remains a lot to be promoted in filtering spam emails more efficiently. Nowadays a major problem in spam filtering as well as text classification in natural language processing is the huge size of vector space due to the numerous feature terms, which is usually the cause of extensive calculation and slow classification. Extracting semantic meanings from the content of texts and using these as feature terms to build up the vector space, instead of using words as feature terms in tradition ways, could reduce the dimension of vectors effectively and promote the classification at the same time. In this paper, a novel Chinese spam filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content. Both the extraction of semantic meanings and the selection of feature terms are implemented through attaching annotations on the texts layer-by-layer. This filter performed well when experimented on a public Chinese spam corpus.","PeriodicalId":211040,"journal":{"name":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2016.7449809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Spam has been a serious and annoying problem for decades. Even though plenty of solutions have been put forward, there still remains a lot to be promoted in filtering spam emails more efficiently. Nowadays a major problem in spam filtering as well as text classification in natural language processing is the huge size of vector space due to the numerous feature terms, which is usually the cause of extensive calculation and slow classification. Extracting semantic meanings from the content of texts and using these as feature terms to build up the vector space, instead of using words as feature terms in tradition ways, could reduce the dimension of vectors effectively and promote the classification at the same time. In this paper, a novel Chinese spam filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content. Both the extraction of semantic meanings and the selection of feature terms are implemented through attaching annotations on the texts layer-by-layer. This filter performed well when experimented on a public Chinese spam corpus.