Email Spam Detection using Bidirectional Long Short Term Memory with Convolutional Neural Network

2020 IEEE Region 10 Symposium (TENSYMP) Pub Date : 2020-06-05 DOI:10.1109/TENSYMP50017.2020.9230769

Sefat E Rahman, Shofi Ullah

{"title":"Email Spam Detection using Bidirectional Long Short Term Memory with Convolutional Neural Network","authors":"Sefat E Rahman, Shofi Ullah","doi":"10.1109/TENSYMP50017.2020.9230769","DOIUrl":null,"url":null,"abstract":"Communication over email in this era of Internet has become very popular on account of its being cheap and easy to use for messaging and sharing important information to others. But spam messages often times make large volume of unwanted messages in the users inbox and it also wastes the resources as well as valuable time of the users. Therefore, in order to identify the message whether it is spam or ham, an efficient and accurate technique is required. In this paper, we propose a new model for detecting spam messages based on the sentiment analysis of the textual data of the email body. We incorporate Word-Embeddings and Bidirectional LSTM network to analyze the sentimental and sequential properties of texts. Furthermore, we speed up the training time and extract higher level text features for Bi-LSTM network using Convolution Neural Network. We involve two datasets namely lingspam dataset and spam text message classification dataset and adopt recall, precision and f-score for comparing and evaluating the performance of our proposed approach. Our model achieves improved performance of accuracy about 98-99%. Apart from this, we demonstrate our model outperforms not only to some popular machine learning classifiers but also to state of the art approaches for detecting spam messages and hence, proves the superiority by itself.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"8 1","pages":"1307-1311"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Communication over email in this era of Internet has become very popular on account of its being cheap and easy to use for messaging and sharing important information to others. But spam messages often times make large volume of unwanted messages in the users inbox and it also wastes the resources as well as valuable time of the users. Therefore, in order to identify the message whether it is spam or ham, an efficient and accurate technique is required. In this paper, we propose a new model for detecting spam messages based on the sentiment analysis of the textual data of the email body. We incorporate Word-Embeddings and Bidirectional LSTM network to analyze the sentimental and sequential properties of texts. Furthermore, we speed up the training time and extract higher level text features for Bi-LSTM network using Convolution Neural Network. We involve two datasets namely lingspam dataset and spam text message classification dataset and adopt recall, precision and f-score for comparing and evaluating the performance of our proposed approach. Our model achieves improved performance of accuracy about 98-99%. Apart from this, we demonstrate our model outperforms not only to some popular machine learning classifiers but also to state of the art approaches for detecting spam messages and hence, proves the superiority by itself.

查看原文本刊更多论文

基于双向长短期记忆的卷积神经网络垃圾邮件检测

在这个互联网时代，通过电子邮件交流已经变得非常流行，因为它便宜且易于使用，可以发送消息并与他人分享重要信息。但是垃圾邮件往往会在用户的收件箱中产生大量不需要的消息，同时也浪费了用户的资源和宝贵的时间。因此，为了识别邮件是垃圾邮件还是火腿，需要一种高效准确的技术。本文提出了一种基于邮件正文文本数据情感分析的垃圾邮件检测新模型。我们结合词嵌入和双向LSTM网络来分析文本的情感和顺序属性。此外，我们还利用卷积神经网络加快了Bi-LSTM网络的训练时间，提取了更高层次的文本特征。我们涉及两个数据集，即lingspam数据集和垃圾短信分类数据集，并采用召回率、精度和f-score来比较和评估我们提出的方法的性能。该模型的准确率提高了98-99%。除此之外，我们证明了我们的模型不仅优于一些流行的机器学习分类器，而且优于检测垃圾邮件的最先进方法，因此，证明了它本身的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE Region 10 Symposium (TENSYMP)

自引率

0.00%

发文量