Fatten Features and Drop Wastes: Finding Repeaters' Reviews by Feature Generation and Feature Selection

Naoki Muramoto, Hiromi Shiraga, Kilho Shin, Hiroaki Ohshima
{"title":"Fatten Features and Drop Wastes: Finding Repeaters' Reviews by Feature Generation and Feature Selection","authors":"Naoki Muramoto, Hiromi Shiraga, Kilho Shin, Hiroaki Ohshima","doi":"10.1145/3366030.3366133","DOIUrl":null,"url":null,"abstract":"In this paper, we proposed a method for determining whether a given restaurant review comment is a repeater's review, or not. We often use restaurant review sites to decide which restaurant to go to. When we read a restaurant review comment, we can know whether the reviewer is a repeater of the restaurant. If a certain restaurant has many repeaters, the restaurant must be great. However, restaurant review sites usually do not provide a \"revisit rate\". Therefore, we tackle a problem for determining whether a review is a repeater's review, or not. There are many sentences in a review comment that are completely not useful for determining whether the review is a repeater review, such as what was ordered, what was delicious, or how was the price. To confront such difficulties, we have taken the following approach. First, very various features are extracted from review comments so as not to miss the features that represent repeaters' reviews. Next, from the very various features, only the necessary features that really contribute to the classification is selected by a feature selection method. Finally, classification is performed using a classifier. We have implemented the proposed method using super-CWC [12], a state-of-the-art feature selection method, and SVM. The experimental results show that the proposed method is better than other methods.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366030.3366133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we proposed a method for determining whether a given restaurant review comment is a repeater's review, or not. We often use restaurant review sites to decide which restaurant to go to. When we read a restaurant review comment, we can know whether the reviewer is a repeater of the restaurant. If a certain restaurant has many repeaters, the restaurant must be great. However, restaurant review sites usually do not provide a "revisit rate". Therefore, we tackle a problem for determining whether a review is a repeater's review, or not. There are many sentences in a review comment that are completely not useful for determining whether the review is a repeater review, such as what was ordered, what was delicious, or how was the price. To confront such difficulties, we have taken the following approach. First, very various features are extracted from review comments so as not to miss the features that represent repeaters' reviews. Next, from the very various features, only the necessary features that really contribute to the classification is selected by a feature selection method. Finally, classification is performed using a classifier. We have implemented the proposed method using super-CWC [12], a state-of-the-art feature selection method, and SVM. The experimental results show that the proposed method is better than other methods.
增加特征和减少浪费:通过特征生成和特征选择找到中继器的评论
在本文中,我们提出了一种方法来确定给定的餐馆评论是否为重复者的评论。我们经常使用餐馆评论网站来决定去哪家餐馆。当我们阅读一个餐厅的评论评论时,我们可以知道评论者是否是该餐厅的重复者。如果一家餐厅有很多中继器,那么这家餐厅一定很棒。然而,餐厅评论网站通常不提供“重访率”。因此,我们处理的问题是确定一个评审是否是重复者的评审。评论评论中有很多句子对于判断评论是否为重复评论完全没有用处,比如点了什么,什么好吃,或者价格如何。针对这些困难,我们采取了以下措施。首先,从评论评论中提取非常多的特征,以免错过代表中继者评论的特征。接下来,从各种各样的特征中,通过特征选择方法选择真正有助于分类的必要特征。最后,使用分类器执行分类。我们使用super-CWC[12](一种最先进的特征选择方法)和SVM实现了所提出的方法。实验结果表明,该方法优于其他方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信