A novel and secured email classification using deep neural network with bidirectional long short-term memory

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2024-05-27 DOI:10.1016/j.csl.2024.101667

A. Poobalan , K. Ganapriya , K. Kalaivani , K. Parthiban

{"title":"A novel and secured email classification using deep neural network with bidirectional long short-term memory","authors":"A. Poobalan , K. Ganapriya , K. Kalaivani , K. Parthiban","doi":"10.1016/j.csl.2024.101667","DOIUrl":null,"url":null,"abstract":"<div><p>Email data has some characteristics that are different from other social media data, such as a large range of answers, formal language, notable length variations, high degrees of anomalies, and indirect relationships. The main goal in this research is to develop a robust and computationally efficient classifier that can distinguish between spam and regular email content. The benchmark Enron dataset, which is accessible to the public, was used for the tests. The six distinct Enron data sets we acquired were combined to generate the final seven Enron data sets. The dataset undergoes early preprocessing to remove superfluous sentences. The proposed model Bidirectional Long Short-Term Memory (BiLSTM) apply spam labels and to examine email documents for spam. On seven Enron datasets, DNN-BiLSTM performs better than other classifiers in the performance comparison in terms of accuracy. DNN-BiLSTM and convolutional neural networks demonstrated that they can classify spam with 96.39 % and 98.69 % accuracy, respectively, in comparison to other machine learning classifiers. The risks associated with cloud data management and potential security flaws are also covered in the paper. This research presents hybrid encryption as a means of protecting cloud data while preserving privacy by using the hybrid AES-Rabit encryption algorithm which is based on symmetric session key exchange.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101667"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000500/pdfft?md5=93a3ab04f63a63c4343031dc3b1f9eca&pid=1-s2.0-S0885230824000500-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000500","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Email data has some characteristics that are different from other social media data, such as a large range of answers, formal language, notable length variations, high degrees of anomalies, and indirect relationships. The main goal in this research is to develop a robust and computationally efficient classifier that can distinguish between spam and regular email content. The benchmark Enron dataset, which is accessible to the public, was used for the tests. The six distinct Enron data sets we acquired were combined to generate the final seven Enron data sets. The dataset undergoes early preprocessing to remove superfluous sentences. The proposed model Bidirectional Long Short-Term Memory (BiLSTM) apply spam labels and to examine email documents for spam. On seven Enron datasets, DNN-BiLSTM performs better than other classifiers in the performance comparison in terms of accuracy. DNN-BiLSTM and convolutional neural networks demonstrated that they can classify spam with 96.39 % and 98.69 % accuracy, respectively, in comparison to other machine learning classifiers. The risks associated with cloud data management and potential security flaws are also covered in the paper. This research presents hybrid encryption as a means of protecting cloud data while preserving privacy by using the hybrid AES-Rabit encryption algorithm which is based on symmetric session key exchange.

查看原文本刊更多论文

利用双向长短期记忆的深度神经网络实现新颖安全的电子邮件分类

电子邮件数据具有一些不同于其他社交媒体数据的特点，如答案范围大、语言正式、长度变化明显、异常程度高以及关系间接等。本研究的主要目标是开发一种稳健且计算效率高的分类器，能够区分垃圾邮件和普通邮件内容。测试使用了公众可访问的基准安然数据集。我们将获得的六个不同的安然数据集合并，最终生成七个安然数据集。数据集经过了早期预处理，以去除多余的句子。我们提出的双向长短时记忆（BiLSTM）模型应用垃圾邮件标签，检查电子邮件文档中是否存在垃圾邮件。在 7 个安然数据集上，DNN-BiLSTM 的准确率在性能比较中优于其他分类器。与其他机器学习分类器相比，DNN-BiLSTM 和卷积神经网络对垃圾邮件的分类准确率分别为 96.39% 和 98.69%。论文还介绍了与云数据管理相关的风险和潜在的安全漏洞。这项研究提出了混合加密技术，通过使用基于对称会话密钥交换的混合 AES-Rabit 加密算法，在保护隐私的同时保护云数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.