Efficient email classification strategy based on semantic methods

Rajendra Prasad Kudumula
{"title":"Efficient email classification strategy based on semantic methods","authors":"Rajendra Prasad Kudumula","doi":"10.33545/26633582.2020.v2.i2a.35","DOIUrl":null,"url":null,"abstract":"Emails have emerged as one of the foremost packages in each day life. The continuous increase in the wide variety of email users has led to a huge boom of unsolicited emails, which might be also known as junk mail emails. Managing and classifying this large variety of emails is an important challenge. In this paper, a green email filtering approach based totally on semantic techniques is addressed. The proposed technique employs the Word Net ontology and applies exceptional semantic-based totally strategies and similarity measures for lowering the huge number of extracted textual features, and as a result, the gap and time complexities are reduced. Most of the approaches delivered to remedy this trouble treated the high dimensionality of emails by the use of syntactic feature selection. Moreover, to get the minimal most appropriate features’ set, function dimensionality reduction has been integrated using characteristic selection strategies which include the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the usual benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the function choice achieves excessive computational performance at high area and time discount rates. A comparative study for numerous classification algorithms indicated that the Logistic Regression achieves the very best accuracy in comparison to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS characteristic choice technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90 reductions. Besides, the carried-out experiments showed that the proposed paintings have a highly sizeable overall performance with better accuracy and much less time in comparison to other related works.","PeriodicalId":147954,"journal":{"name":"International Journal of Engineering in Computer Science","volume":"22 21","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33545/26633582.2020.v2.i2a.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Emails have emerged as one of the foremost packages in each day life. The continuous increase in the wide variety of email users has led to a huge boom of unsolicited emails, which might be also known as junk mail emails. Managing and classifying this large variety of emails is an important challenge. In this paper, a green email filtering approach based totally on semantic techniques is addressed. The proposed technique employs the Word Net ontology and applies exceptional semantic-based totally strategies and similarity measures for lowering the huge number of extracted textual features, and as a result, the gap and time complexities are reduced. Most of the approaches delivered to remedy this trouble treated the high dimensionality of emails by the use of syntactic feature selection. Moreover, to get the minimal most appropriate features’ set, function dimensionality reduction has been integrated using characteristic selection strategies which include the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the usual benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the function choice achieves excessive computational performance at high area and time discount rates. A comparative study for numerous classification algorithms indicated that the Logistic Regression achieves the very best accuracy in comparison to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS characteristic choice technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90 reductions. Besides, the carried-out experiments showed that the proposed paintings have a highly sizeable overall performance with better accuracy and much less time in comparison to other related works.
基于语义方法的高效邮件分类策略
电子邮件已经成为日常生活中最重要的包裹之一。电子邮件用户种类的不断增加导致了不请自来的电子邮件的激增,这些电子邮件也被称为垃圾邮件。管理和分类这些种类繁多的电子邮件是一项重要的挑战。本文提出了一种完全基于语义技术的绿色电子邮件过滤方法。该技术采用Word Net本体,采用独特的基于语义的完全策略和相似度度量,减少了大量的文本特征提取,从而降低了间隔和时间复杂度。大多数解决这个问题的方法都是通过使用语法特征选择来处理电子邮件的高维性。在此基础上,结合主成分分析(PCA)和相关特征选择(CFS)两种特征选择策略,对特征进行降维,得到最小最合适的特征集。在常用基准安然数据集上的实验结果表明,本文提出的语义过滤方法与函数选择相结合,在高面积和时间折现率下取得了较好的计算性能。通过对多种分类算法的比较研究表明,与Naïve贝叶斯、支持向量机、J48、随机森林和径向基函数网络相比,Logistic回归的准确率最高。通过整合CFS特征选择技术,所有算法的平均记录准确率都在90%以上,降低了90%以上。此外,进行的实验表明,与其他相关作品相比,所提出的画作具有非常可观的整体性能,具有更好的准确性和更少的时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信