基于特征选择的垃圾邮件检测

Rinki Patel, Priyank Thakkar
{"title":"基于特征选择的垃圾邮件检测","authors":"Rinki Patel, Priyank Thakkar","doi":"10.1109/CICN.2014.127","DOIUrl":null,"url":null,"abstract":"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.","PeriodicalId":6487,"journal":{"name":"2014 International Conference on Computational Intelligence and Communication Networks","volume":"117 1","pages":"560-564"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Opinion Spam Detection Using Feature Selection\",\"authors\":\"Rinki Patel, Priyank Thakkar\",\"doi\":\"10.1109/CICN.2014.127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.\",\"PeriodicalId\":6487,\"journal\":{\"name\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"117 1\",\"pages\":\"560-564\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2014.127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2014.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

在现代,对于电子商务企业来说,授权其最终客户撰写关于他们所使用的服务的评论已经变得非常重要。这些审查提供了有关这些产品或服务的重要信息来源。在决定购买新产品或服务之前,这些信息被未来的潜在客户所利用。这些意见或评论也被营销人员用来找出他们自己的产品或服务的缺点,或者找到与竞争对手的产品或服务相关的重要信息。这反过来又允许识别产品的弱点或优势。不幸的是,这种重要的有用性也引发了垃圾邮件的问题,其中包含伪造的正面或恶意的负面意见。本文主要研究欺骗性意见垃圾邮件的检测问题。最近提出了一种基于n-gram技术的意见垃圾检测方法,通过特征选择和意见的不同表示进行了扩展。该问题被建模为分类问题,Naïve贝叶斯(NB)分类器和最小二乘支持向量机(LS-SVM)用于意见的三种不同表示(布尔,词袋和术语频率-逆文档频率(TF-IDF))。所有实验都是在广泛使用的金标准数据集上进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Opinion Spam Detection Using Feature Selection
In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信