Instagram online shop's comment classification using statistical approach

2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE) Pub Date : 2017-11-01 DOI:10.1109/ICITISEE.2017.8285512

F. Prabowo, A. Purwarianti

{"title":"Instagram online shop's comment classification using statistical approach","authors":"F. Prabowo, A. Purwarianti","doi":"10.1109/ICITISEE.2017.8285512","DOIUrl":null,"url":null,"abstract":"Instagram is one of the currently popular social medias in Indonesia. Some online shop owners use Instagram to show their products and also use its comment section to communicate with their customers. A system that can classify Instagram comment automatically based on its response surely will help online shop owners to respond all various comments without having to read all of the comments. Thus, this study was conducted to find the best method to classify Instagram comment using statistical approach. In general, the system created in this study is divided into 4 components namely pre-process, feature extraction, feature selection, and classification. We compared several techniques in each component. In feature extraction, we compared lexical approaches (unigram) and word embedding approaches. As for the learning algorithm, we compared support vector machine (SVM) and convolutional neural network (CNN). The effect of pre-process and feature selection is also investigated in this study. The preprocess done in this study is basic pre-process, mention conversion, punctuation conversion, emoticon conversion, number conversion, formalization and region name conversion. The data used in this study is 2810 Instagram comments which have been labelled with 3 kinds of responses namely “answered”, “read”, and “ignored”. Using 10-fold cross validation, we conducted 3 experiment types, such as the baseline, pre-process, and word embedding. Baseline was done by using SVM as learning algorithm and unigram as feature. Pre-process experiment was done by adding pre-processing and feature selection to baseline. Word embedding experiment was done by using word embedding as feature and SVM or CNN as learning algorithm. The best result in this study was obtained from word embedding experiment using word embedding as feature representation, CNN as learning algorithm, and pre-processed model data with accuracy of 84.23%.","PeriodicalId":130873,"journal":{"name":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2017.8285512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Instagram is one of the currently popular social medias in Indonesia. Some online shop owners use Instagram to show their products and also use its comment section to communicate with their customers. A system that can classify Instagram comment automatically based on its response surely will help online shop owners to respond all various comments without having to read all of the comments. Thus, this study was conducted to find the best method to classify Instagram comment using statistical approach. In general, the system created in this study is divided into 4 components namely pre-process, feature extraction, feature selection, and classification. We compared several techniques in each component. In feature extraction, we compared lexical approaches (unigram) and word embedding approaches. As for the learning algorithm, we compared support vector machine (SVM) and convolutional neural network (CNN). The effect of pre-process and feature selection is also investigated in this study. The preprocess done in this study is basic pre-process, mention conversion, punctuation conversion, emoticon conversion, number conversion, formalization and region name conversion. The data used in this study is 2810 Instagram comments which have been labelled with 3 kinds of responses namely “answered”, “read”, and “ignored”. Using 10-fold cross validation, we conducted 3 experiment types, such as the baseline, pre-process, and word embedding. Baseline was done by using SVM as learning algorithm and unigram as feature. Pre-process experiment was done by adding pre-processing and feature selection to baseline. Word embedding experiment was done by using word embedding as feature and SVM or CNN as learning algorithm. The best result in this study was obtained from word embedding experiment using word embedding as feature representation, CNN as learning algorithm, and pre-processed model data with accuracy of 84.23%.

查看原文本刊更多论文

运用统计学方法对Instagram网店的评论进行分类

Instagram是目前印尼最流行的社交媒体之一。一些网店老板使用Instagram来展示他们的产品，并使用其评论区与客户交流。一个可以根据Instagram评论的回复自动分类的系统，肯定会帮助在线店主在不阅读所有评论的情况下回复所有评论。因此，本研究旨在利用统计方法找到对Instagram评论进行分类的最佳方法。总的来说，本研究创建的系统分为预处理、特征提取、特征选择和分类4个部分。我们在每个组件中比较了几种技术。在特征提取方面，我们比较了词汇方法(一元图)和词嵌入方法。在学习算法方面，我们比较了支持向量机(SVM)和卷积神经网络(CNN)。本研究还探讨了预处理和特征选择的影响。本研究所做的预处理是基本预处理、提及转换、标点转换、表情转换、数字转换、形式化和地名转换。本研究使用的数据是2810条Instagram评论，这些评论被标记为“回答”、“阅读”和“忽略”三种回应。采用10倍交叉验证，我们进行了基线、预处理和词嵌入3种实验类型。以支持向量机为学习算法，以图为特征进行基线绘制。在基线基础上进行预处理和特征选择，完成预处理实验。采用词嵌入作为特征，SVM或CNN作为学习算法进行词嵌入实验。本研究采用词嵌入作为特征表示，CNN作为学习算法，对模型数据进行预处理，准确率达到84.23%的词嵌入实验结果最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)

自引率

0.00%

发文量