H. M. Abdul Fattah, Md. Masum Al Masba, K. M. Azharul Hasan
{"title":"Sentiment Clustering By Mahalanobis Distance","authors":"H. M. Abdul Fattah, Md. Masum Al Masba, K. M. Azharul Hasan","doi":"10.1109/CEEICT.2018.8628142","DOIUrl":null,"url":null,"abstract":"Sentiment clustering is the computational study of people’s opinions, sentiments, attitudes, and emotions. In this paper, we describe the use of Mahalanobis Distance (MD) to cluster review comments of users. MD is widely used for outlier detection in data mining. We have classified the comments into four clusters namely ’excellent’, ’good’, ’bad’ and ’not recommended’ where the training data has two classifications ’positive’ and ’negative’ only. To use this measure, a Representative Term Document Matrix (RTDM) and Inverse Co-Variance Matrix (C<sup>-1)</sup> has been computed from training data. Using the RTDM and C<sup>-1</sup>, MD has been calculated and clustering thresholds have been fixed based on MD of training data. Using these thresholds, final outcome are determined. We have used the Amazon watch reviews consisting of 62485 reviews, we received good accuracy based on MD based clustering approach. We also applied the approach collecting comments from street people to measure the accuracy and found over 90% of accuracy.","PeriodicalId":417359,"journal":{"name":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEICT.2018.8628142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sentiment clustering is the computational study of people’s opinions, sentiments, attitudes, and emotions. In this paper, we describe the use of Mahalanobis Distance (MD) to cluster review comments of users. MD is widely used for outlier detection in data mining. We have classified the comments into four clusters namely ’excellent’, ’good’, ’bad’ and ’not recommended’ where the training data has two classifications ’positive’ and ’negative’ only. To use this measure, a Representative Term Document Matrix (RTDM) and Inverse Co-Variance Matrix (C-1) has been computed from training data. Using the RTDM and C-1, MD has been calculated and clustering thresholds have been fixed based on MD of training data. Using these thresholds, final outcome are determined. We have used the Amazon watch reviews consisting of 62485 reviews, we received good accuracy based on MD based clustering approach. We also applied the approach collecting comments from street people to measure the accuracy and found over 90% of accuracy.