Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr
{"title":"基于Word2Vec特征聚类的情感分析改进","authors":"Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr","doi":"10.1109/DEXA.2017.41","DOIUrl":null,"url":null,"abstract":"Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features\",\"authors\":\"Eissa Alshari, A. Azman, S. Doraisamy, N. Mustapha, Mustafa Alkeshr\",\"doi\":\"10.1109/DEXA.2017.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.\",\"PeriodicalId\":127009,\"journal\":{\"name\":\"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2017.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2017.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features
Recently, many researchers have shown interest in using Word2Vec as the features for text classification tasks such as sentiment analysis. Its ability to model high quality distributional semantics among words has contributed to its success in many of the tasks. However, due to the high dimensional nature of the Word2Vec features, it increases the complexity for the classifier. In this paper, a method to construct a feature set based on Word2Vec is proposed for sentiment analysis. The method is based on clustering of terms in the vocabulary based on a set of opinion words from a sentiment lexical dictionary. As a result, the feature set for the classification is constructed based on the set of clusters. The effectiveness of the proposed method is evaluated on the Internet Movie Review Dataset with two classifiers, namely the Support Vector Machine and the Logistic Regression. The result is promising, showing that the proposed method can be more effective than the baseline approaches.