Email classification via intention-based segmentation

2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI) Pub Date : 2020-10-01 DOI:10.23919/EECSI50503.2020.9251306

S. K. Sonbhadra, Sonali Agarwal, M. Syafrullah, K. Adiyarta

{"title":"Email classification via intention-based segmentation","authors":"S. K. Sonbhadra, Sonali Agarwal, M. Syafrullah, K. Adiyarta","doi":"10.23919/EECSI50503.2020.9251306","DOIUrl":null,"url":null,"abstract":"Email is the most popular way of personal and official communication among people and organizations. Due to untrusted virtual environment, email systems may face frequent attacks like malware, spamming, social engineering, etc. Spamming is the most common malicious activity, where unsolicited emails are sent in bulk, and these spam emails can be the source of malware, waste resources, hence degrade the productivity. In spam filter development, the most important challenge is to find the correlation between the nature of spam and the interest of the users because the interests of users are dynamic. This paper proposes a novel dynamic spam filter model that considers the changes in the interests of users with time while handling the spam activities. It uses intention-based segmentation to compare different segments of text documents instead of comparing them as a whole. The proposed spam filter is a multi-tier approach where initially, the email content is divided into segments with the help of part of speech (POS) tagging based on voices and tenses. Further, the segments are clustered using hierarchical clustering and compared using the vector space model. In the third stage, concept drift is detected in the clusters to identify the change in the interest of the user. Later, the classification of ham emails into various categories is done in the last stage. For experiments Enron dataset is used and the obtained results are promising.","PeriodicalId":6743,"journal":{"name":"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)","volume":"1 1","pages":"38-44"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/EECSI50503.2020.9251306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Email is the most popular way of personal and official communication among people and organizations. Due to untrusted virtual environment, email systems may face frequent attacks like malware, spamming, social engineering, etc. Spamming is the most common malicious activity, where unsolicited emails are sent in bulk, and these spam emails can be the source of malware, waste resources, hence degrade the productivity. In spam filter development, the most important challenge is to find the correlation between the nature of spam and the interest of the users because the interests of users are dynamic. This paper proposes a novel dynamic spam filter model that considers the changes in the interests of users with time while handling the spam activities. It uses intention-based segmentation to compare different segments of text documents instead of comparing them as a whole. The proposed spam filter is a multi-tier approach where initially, the email content is divided into segments with the help of part of speech (POS) tagging based on voices and tenses. Further, the segments are clustered using hierarchical clustering and compared using the vector space model. In the third stage, concept drift is detected in the clusters to identify the change in the interest of the user. Later, the classification of ham emails into various categories is done in the last stage. For experiments Enron dataset is used and the obtained results are promising.

查看原文本刊更多论文

基于意图的电子邮件分类

电子邮件是个人和组织之间最流行的个人和官方沟通方式。由于不可信的虚拟环境，电子邮件系统可能经常面临恶意软件、垃圾邮件、社会工程等攻击。垃圾邮件是最常见的恶意活动，其中大量发送未经请求的电子邮件，而这些垃圾邮件可能是恶意软件的来源，浪费资源，从而降低生产力。在垃圾邮件过滤器的开发中，最重要的挑战是找到垃圾邮件的性质与用户的兴趣之间的相关性，因为用户的兴趣是动态的。本文提出了一种新的动态垃圾邮件过滤模型，该模型在处理垃圾邮件活动时考虑了用户兴趣随时间的变化。它使用基于意图的分割来比较文本文档的不同部分，而不是将它们作为一个整体进行比较。所提出的垃圾邮件过滤器是一种多层方法，最初，电子邮件内容在基于语态和时态的词性(POS)标记的帮助下被分成段。此外，使用分层聚类对片段进行聚类，并使用向量空间模型进行比较。在第三阶段，在聚类中检测概念漂移，以识别用户兴趣的变化。然后，在最后一个阶段将垃圾邮件分类为各种类别。实验采用安然数据集，得到了令人满意的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)

自引率

0.00%

发文量