Deep Ensemble Model for Spam Classification in Twitter via Sentiment Extraction: Bio-Inspiration-Based Classification Model

Int. J. Image Graph. Pub Date : 2022-07-28 DOI:10.1142/s0219467823500341

B. Ainapure, M. Boopathi, Dr. Chandra Sekhar Kolli, C. Jackulin

{"title":"Deep Ensemble Model for Spam Classification in Twitter via Sentiment Extraction: Bio-Inspiration-Based Classification Model","authors":"B. Ainapure, M. Boopathi, Dr. Chandra Sekhar Kolli, C. Jackulin","doi":"10.1142/s0219467823500341","DOIUrl":null,"url":null,"abstract":"Twitter Spam has turned out to be a significant predicament of these days. Current works concern on exploiting the machine learning models to detect the spams in Twitter by determining the statistic features of the tweets. Even though these models result in better success, it is hard to sustain the performances attained by the supervised approaches. This paper intends to introduce a deep learning-assisted spam classification model on twitter. This classification is based on sentiments and topics modeled in it. The initial step is data collection. Subsequently, the collected data are preprocessed with “stop word removal, stemming and tokenization”. The next step is feature extraction, wherein, the post tagging, headwords, rule-based lexicon, word length, and weighted holoentropy features are extracted. Then, the proposed sentiment score extraction is carried out to analyze their variations in nonspam and spam information. At last, the diffusions of spam data on Twitter are classified into spam and nonspams. For this, an Optimized Deep Ensemble technique is introduced that encloses “neural network (NN), support vector machine (SVM), random forest (RF) and convolutional neural network (DNN)”. Particularly, the weights of DNN are optimally tuned by an arithmetic crossover-based cat swarm optimization (AC-CS) model. At last, the supremacy of the developed approach is examined via evaluation over extant techniques. Accordingly, the proposed AC-CS [Formula: see text] ensemble model attained better accuracy value when the learning percentage is 80, which is 18.1%, 14.89%, 11.7%, 12.77%, 10.64%, 6.38%, 6.38%, and 6.38% higher than SVM, DNN, RNN, DBN, MFO [Formula: see text] ensemble model, WOA [Formula: see text] ensemble model, EHO [Formula: see text] ensemble model and CSO [Formula: see text] ensemble model models.","PeriodicalId":177479,"journal":{"name":"Int. J. Image Graph.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Image Graph.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467823500341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Twitter Spam has turned out to be a significant predicament of these days. Current works concern on exploiting the machine learning models to detect the spams in Twitter by determining the statistic features of the tweets. Even though these models result in better success, it is hard to sustain the performances attained by the supervised approaches. This paper intends to introduce a deep learning-assisted spam classification model on twitter. This classification is based on sentiments and topics modeled in it. The initial step is data collection. Subsequently, the collected data are preprocessed with “stop word removal, stemming and tokenization”. The next step is feature extraction, wherein, the post tagging, headwords, rule-based lexicon, word length, and weighted holoentropy features are extracted. Then, the proposed sentiment score extraction is carried out to analyze their variations in nonspam and spam information. At last, the diffusions of spam data on Twitter are classified into spam and nonspams. For this, an Optimized Deep Ensemble technique is introduced that encloses “neural network (NN), support vector machine (SVM), random forest (RF) and convolutional neural network (DNN)”. Particularly, the weights of DNN are optimally tuned by an arithmetic crossover-based cat swarm optimization (AC-CS) model. At last, the supremacy of the developed approach is examined via evaluation over extant techniques. Accordingly, the proposed AC-CS [Formula: see text] ensemble model attained better accuracy value when the learning percentage is 80, which is 18.1%, 14.89%, 11.7%, 12.77%, 10.64%, 6.38%, 6.38%, and 6.38% higher than SVM, DNN, RNN, DBN, MFO [Formula: see text] ensemble model, WOA [Formula: see text] ensemble model, EHO [Formula: see text] ensemble model and CSO [Formula: see text] ensemble model models.

查看原文本刊更多论文

基于情感提取的Twitter垃圾邮件深度集成分类模型:基于生物灵感的分类模型

Twitter垃圾邮件已被证明是这些天的一个重大困境。目前的工作是利用机器学习模型通过确定推文的统计特征来检测推特中的垃圾邮件。尽管这些模型取得了更好的成功，但很难维持由监督方法获得的性能。本文旨在介绍一种基于twitter的深度学习辅助垃圾邮件分类模型。这种分类基于其中建模的情感和主题。第一步是数据收集。随后，对收集到的数据进行“停止词去除、词干提取和标记化”预处理。下一步是特征提取，提取帖子标注、关键词、基于规则的词典、词长和加权全息熵特征。然后，对所提出的情感评分进行提取，分析其在非垃圾邮件和垃圾邮件信息中的变化。最后，将Twitter上的垃圾邮件数据扩散分为垃圾邮件和非垃圾邮件。为此，介绍了一种包含“神经网络(NN)、支持向量机(SVM)、随机森林(RF)和卷积神经网络(DNN)”的优化深度集成技术。其中，深度神经网络的权重通过基于算法交叉的猫群优化(AC-CS)模型进行优化调整。最后，通过对现有技术的评价来检验所开发方法的优越性。因此，所提出的AC-CS[公式:见文]集成模型在学习百分比为80时获得了较好的准确率值，分别比SVM、DNN、RNN、DBN、MFO[公式:见文]集成模型、WOA[公式:见文]集成模型、EHO[公式:见文]集成模型和CSO[公式:见文]集成模型分别高出18.1%、14.89%、11.7%、12.77%、10.64%、6.38%、6.38%和6.38%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Image Graph.

自引率

0.00%

发文量