N. Wattanakitrungroj, Nichapat Pinpo, Sasiporn Tongman
{"title":"基于最小特征向量和机器学习算法的情感极性分类","authors":"N. Wattanakitrungroj, Nichapat Pinpo, Sasiporn Tongman","doi":"10.1145/3468784.3469947","DOIUrl":null,"url":null,"abstract":"Recently, social media users can comment as texts to describe their opinions. These texts can be analyzed to classify them into either positive or negative attitude. Feature vectors for representing the texts must be designed and prepared before building a classifier. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The length of the feature vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This paper, the low-dimensional vectors, V8D, were proposed for representing the texts. The set of positive and negative words including the words of negation which have the significant meanings were considered as information to create these vectors. Four machine learning algorithms to solve the classification problem, i.e., k-Nearest Neighbors, Naïve Bayes classifier, Artificial Neural Networks and Support Vector Machine, were applied to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed V8D vectors were compared with the traditional TF-IDF vector in term of the predictive correctness. The experimental results show that representing text as our V8D vector for opinion text classification can provide the best efficiency in both of space usage and processing time.","PeriodicalId":341589,"journal":{"name":"The 12th International Conference on Advances in Information Technology","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentiment Polarity Classification using Minimal Feature Vectors and Machine Learning Algorithms\",\"authors\":\"N. Wattanakitrungroj, Nichapat Pinpo, Sasiporn Tongman\",\"doi\":\"10.1145/3468784.3469947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, social media users can comment as texts to describe their opinions. These texts can be analyzed to classify them into either positive or negative attitude. Feature vectors for representing the texts must be designed and prepared before building a classifier. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The length of the feature vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This paper, the low-dimensional vectors, V8D, were proposed for representing the texts. The set of positive and negative words including the words of negation which have the significant meanings were considered as information to create these vectors. Four machine learning algorithms to solve the classification problem, i.e., k-Nearest Neighbors, Naïve Bayes classifier, Artificial Neural Networks and Support Vector Machine, were applied to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed V8D vectors were compared with the traditional TF-IDF vector in term of the predictive correctness. The experimental results show that representing text as our V8D vector for opinion text classification can provide the best efficiency in both of space usage and processing time.\",\"PeriodicalId\":341589,\"journal\":{\"name\":\"The 12th International Conference on Advances in Information Technology\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 12th International Conference on Advances in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3468784.3469947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 12th International Conference on Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468784.3469947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sentiment Polarity Classification using Minimal Feature Vectors and Machine Learning Algorithms
Recently, social media users can comment as texts to describe their opinions. These texts can be analyzed to classify them into either positive or negative attitude. Feature vectors for representing the texts must be designed and prepared before building a classifier. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The length of the feature vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This paper, the low-dimensional vectors, V8D, were proposed for representing the texts. The set of positive and negative words including the words of negation which have the significant meanings were considered as information to create these vectors. Four machine learning algorithms to solve the classification problem, i.e., k-Nearest Neighbors, Naïve Bayes classifier, Artificial Neural Networks and Support Vector Machine, were applied to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed V8D vectors were compared with the traditional TF-IDF vector in term of the predictive correctness. The experimental results show that representing text as our V8D vector for opinion text classification can provide the best efficiency in both of space usage and processing time.