Spam Detection on Social Media Using Semantic Convolutional Neural Network

Int. J. Knowl. Discov. Bioinform. Pub Date : 1900-01-01 DOI:10.4018/IJKDB.2018010102

Gauri Jain, Manisha Sharma, Basant Agarwal

{"title":"Spam Detection on Social Media Using Semantic Convolutional Neural Network","authors":"Gauri Jain, Manisha Sharma, Basant Agarwal","doi":"10.4018/IJKDB.2018010102","DOIUrl":null,"url":null,"abstract":"This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"260 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Discov. Bioinform.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJKDB.2018010102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 52

Abstract

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.

查看原文本刊更多论文

基于语义卷积神经网络的社交媒体垃圾邮件检测

本文描述了社交媒体文本中的垃圾邮件检测如何变得越来越重要，因为网络上的垃圾邮件数量呈指数级增长。这很有挑战性，特别是在字符数量有限的文本情况下。有效的垃圾邮件检测需要学习更多的有效特征。在当前的文章中，我们提出了一种深度学习技术，即卷积神经网络(CNN)，用于垃圾邮件检测，并在其上面添加了一个语义层。由此产生的模型被称为语义卷积神经网络(SCNN)。语义层利用Word2vec对随机词向量进行训练，得到语义丰富的词嵌入。WordNet和ConceptNet用于查找与给定单词相似的单词，以防该单词在word2vec中缺失。该体系结构在两个语料库上进行评估:SMS Spam数据集(UCI存储库)和Twitter数据集(从公开的实时推文中废弃的推文)。作者的方法在SMS垃圾邮件数据集上的准确率为98.65%，在Twitter数据集上的准确率为94.40%，优于最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Knowl. Discov. Bioinform.

自引率

0.00%

发文量