基于最小特征向量和机器学习算法的情感极性分类

The 12th International Conference on Advances in Information Technology Pub Date : 2021-06-29 DOI:10.1145/3468784.3469947

N. Wattanakitrungroj, Nichapat Pinpo, Sasiporn Tongman

{"title":"基于最小特征向量和机器学习算法的情感极性分类","authors":"N. Wattanakitrungroj, Nichapat Pinpo, Sasiporn Tongman","doi":"10.1145/3468784.3469947","DOIUrl":null,"url":null,"abstract":"Recently, social media users can comment as texts to describe their opinions. These texts can be analyzed to classify them into either positive or negative attitude. Feature vectors for representing the texts must be designed and prepared before building a classifier. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The length of the feature vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This paper, the low-dimensional vectors, V8D, were proposed for representing the texts. The set of positive and negative words including the words of negation which have the significant meanings were considered as information to create these vectors. Four machine learning algorithms to solve the classification problem, i.e., k-Nearest Neighbors, Naïve Bayes classifier, Artificial Neural Networks and Support Vector Machine, were applied to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed V8D vectors were compared with the traditional TF-IDF vector in term of the predictive correctness. The experimental results show that representing text as our V8D vector for opinion text classification can provide the best efficiency in both of space usage and processing time.","PeriodicalId":341589,"journal":{"name":"The 12th International Conference on Advances in Information Technology","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentiment Polarity Classification using Minimal Feature Vectors and Machine Learning Algorithms\",\"authors\":\"N. Wattanakitrungroj, Nichapat Pinpo, Sasiporn Tongman\",\"doi\":\"10.1145/3468784.3469947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, social media users can comment as texts to describe their opinions. These texts can be analyzed to classify them into either positive or negative attitude. Feature vectors for representing the texts must be designed and prepared before building a classifier. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The length of the feature vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This paper, the low-dimensional vectors, V8D, were proposed for representing the texts. The set of positive and negative words including the words of negation which have the significant meanings were considered as information to create these vectors. Four machine learning algorithms to solve the classification problem, i.e., k-Nearest Neighbors, Naïve Bayes classifier, Artificial Neural Networks and Support Vector Machine, were applied to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed V8D vectors were compared with the traditional TF-IDF vector in term of the predictive correctness. The experimental results show that representing text as our V8D vector for opinion text classification can provide the best efficiency in both of space usage and processing time.\",\"PeriodicalId\":341589,\"journal\":{\"name\":\"The 12th International Conference on Advances in Information Technology\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 12th International Conference on Advances in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3468784.3469947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 12th International Conference on Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468784.3469947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，社交媒体用户可以将评论作为文本来描述他们的观点。通过对这些文本的分析，可以将其分为积极态度和消极态度。在构建分类器之前，必须设计和准备用于表示文本的特征向量。通常，文本由文本中出现的术语的权重或频率的向量表示。特征向量的长度等于字典中所有文本中可能出现的单词的个数。字典中大量的单词导致了文本表示的高维向量，给文本分类模型的训练和测试带来了较长的处理时间。本文提出用低维向量V8D来表示文本。将包含具有重要意义的否定词的肯定词和否定词集合作为生成这些向量的信息。采用k-Nearest Neighbors、Naïve贝叶斯分类器、人工神经网络和支持向量机四种机器学习算法对意见文本进行分类。通过在8个不同领域的数据集上进行实验，比较了本文提出的V8D向量与传统TF-IDF向量的预测正确性。实验结果表明，将文本表示为我们的V8D向量用于意见文本分类，在空间利用率和处理时间上都能提供最佳的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment Polarity Classification using Minimal Feature Vectors and Machine Learning Algorithms

Recently, social media users can comment as texts to describe their opinions. These texts can be analyzed to classify them into either positive or negative attitude. Feature vectors for representing the texts must be designed and prepared before building a classifier. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The length of the feature vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This paper, the low-dimensional vectors, V8D, were proposed for representing the texts. The set of positive and negative words including the words of negation which have the significant meanings were considered as information to create these vectors. Four machine learning algorithms to solve the classification problem, i.e., k-Nearest Neighbors, Naïve Bayes classifier, Artificial Neural Networks and Support Vector Machine, were applied to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed V8D vectors were compared with the traditional TF-IDF vector in term of the predictive correctness. The experimental results show that representing text as our V8D vector for opinion text classification can provide the best efficiency in both of space usage and processing time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 12th International Conference on Advances in Information Technology

自引率

0.00%

发文量