Efficient email classification strategy based on semantic methods

International Journal of Engineering in Computer Science Pub Date : 2020-07-01 DOI:10.33545/26633582.2020.v2.i2a.35

Rajendra Prasad Kudumula

{"title":"Efficient email classification strategy based on semantic methods","authors":"Rajendra Prasad Kudumula","doi":"10.33545/26633582.2020.v2.i2a.35","DOIUrl":null,"url":null,"abstract":"Emails have emerged as one of the foremost packages in each day life. The continuous increase in the wide variety of email users has led to a huge boom of unsolicited emails, which might be also known as junk mail emails. Managing and classifying this large variety of emails is an important challenge. In this paper, a green email filtering approach based totally on semantic techniques is addressed. The proposed technique employs the Word Net ontology and applies exceptional semantic-based totally strategies and similarity measures for lowering the huge number of extracted textual features, and as a result, the gap and time complexities are reduced. Most of the approaches delivered to remedy this trouble treated the high dimensionality of emails by the use of syntactic feature selection. Moreover, to get the minimal most appropriate features’ set, function dimensionality reduction has been integrated using characteristic selection strategies which include the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the usual benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the function choice achieves excessive computational performance at high area and time discount rates. A comparative study for numerous classification algorithms indicated that the Logistic Regression achieves the very best accuracy in comparison to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS characteristic choice technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90 reductions. Besides, the carried-out experiments showed that the proposed paintings have a highly sizeable overall performance with better accuracy and much less time in comparison to other related works.","PeriodicalId":147954,"journal":{"name":"International Journal of Engineering in Computer Science","volume":"22 21","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33545/26633582.2020.v2.i2a.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Emails have emerged as one of the foremost packages in each day life. The continuous increase in the wide variety of email users has led to a huge boom of unsolicited emails, which might be also known as junk mail emails. Managing and classifying this large variety of emails is an important challenge. In this paper, a green email filtering approach based totally on semantic techniques is addressed. The proposed technique employs the Word Net ontology and applies exceptional semantic-based totally strategies and similarity measures for lowering the huge number of extracted textual features, and as a result, the gap and time complexities are reduced. Most of the approaches delivered to remedy this trouble treated the high dimensionality of emails by the use of syntactic feature selection. Moreover, to get the minimal most appropriate features’ set, function dimensionality reduction has been integrated using characteristic selection strategies which include the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the usual benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the function choice achieves excessive computational performance at high area and time discount rates. A comparative study for numerous classification algorithms indicated that the Logistic Regression achieves the very best accuracy in comparison to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS characteristic choice technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90 reductions. Besides, the carried-out experiments showed that the proposed paintings have a highly sizeable overall performance with better accuracy and much less time in comparison to other related works.

查看原文本刊更多论文

基于语义方法的高效邮件分类策略

电子邮件已经成为日常生活中最重要的包裹之一。电子邮件用户种类的不断增加导致了不请自来的电子邮件的激增，这些电子邮件也被称为垃圾邮件。管理和分类这些种类繁多的电子邮件是一项重要的挑战。本文提出了一种完全基于语义技术的绿色电子邮件过滤方法。该技术采用Word Net本体，采用独特的基于语义的完全策略和相似度度量，减少了大量的文本特征提取，从而降低了间隔和时间复杂度。大多数解决这个问题的方法都是通过使用语法特征选择来处理电子邮件的高维性。在此基础上，结合主成分分析(PCA)和相关特征选择(CFS)两种特征选择策略，对特征进行降维，得到最小最合适的特征集。在常用基准安然数据集上的实验结果表明，本文提出的语义过滤方法与函数选择相结合，在高面积和时间折现率下取得了较好的计算性能。通过对多种分类算法的比较研究表明，与Naïve贝叶斯、支持向量机、J48、随机森林和径向基函数网络相比，Logistic回归的准确率最高。通过整合CFS特征选择技术，所有算法的平均记录准确率都在90%以上，降低了90%以上。此外，进行的实验表明，与其他相关作品相比，所提出的画作具有非常可观的整体性能，具有更好的准确性和更少的时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Engineering in Computer Science

自引率

0.00%

发文量