Automatic identification of sentiment in unstructured text

Revista de Educación Básica Pub Date : 2022-06-30 DOI:10.35429/jbe.2022.15.6.22.28

José Carmen Morales-Castro, José Armando Pérez-Crespo, Tirtha Prasad-Mukhopadhyay, R. Guzmán-Cabrera

{"title":"Automatic identification of sentiment in unstructured text","authors":"José Carmen Morales-Castro, José Armando Pérez-Crespo, Tirtha Prasad-Mukhopadhyay, R. Guzmán-Cabrera","doi":"10.35429/jbe.2022.15.6.22.28","DOIUrl":null,"url":null,"abstract":"The constant increase of information in digital format forces us to have new tools that allow us to download, organize and analyze the information available on the web. One of the analyses performed on unstructured information is polarity identification. In this paper we present a method to carry out polarity identification in unstructured texts. Specifically, texts downloaded from the social network Twitter are used. The current popularity of social networks, has caused a great prominence among different users for the generation of information day by day. Twitter presents us with a great challenge in the automatic processing of natural language, mainly when the number of opinions is very large and automatic processing is required. In our case, in the determination of the polarity contained in a tweet. In this paper we present results obtained using different machine learning methods widely known in the state of the art, such as: Support Vector Machine, Naive Bayes, Logistic Regression, Nearest Neighbors and Random Forest, which are used in two implemented classification scenarios: cross-validation and training and test sets. Two data sets are used for the evaluation of the implemented methodology. The best results are obtained with Support Vector Machine for both datasets, the obtained accuracy values higher than 83 % allow to see the viability of the implemented methodology.","PeriodicalId":366815,"journal":{"name":"Revista de Educación Básica","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista de Educación Básica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35429/jbe.2022.15.6.22.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The constant increase of information in digital format forces us to have new tools that allow us to download, organize and analyze the information available on the web. One of the analyses performed on unstructured information is polarity identification. In this paper we present a method to carry out polarity identification in unstructured texts. Specifically, texts downloaded from the social network Twitter are used. The current popularity of social networks, has caused a great prominence among different users for the generation of information day by day. Twitter presents us with a great challenge in the automatic processing of natural language, mainly when the number of opinions is very large and automatic processing is required. In our case, in the determination of the polarity contained in a tweet. In this paper we present results obtained using different machine learning methods widely known in the state of the art, such as: Support Vector Machine, Naive Bayes, Logistic Regression, Nearest Neighbors and Random Forest, which are used in two implemented classification scenarios: cross-validation and training and test sets. Two data sets are used for the evaluation of the implemented methodology. The best results are obtained with Support Vector Machine for both datasets, the obtained accuracy values higher than 83 % allow to see the viability of the implemented methodology.

查看原文本刊更多论文

非结构化文本情感的自动识别

数字格式信息的不断增加迫使我们拥有新的工具，使我们能够下载、组织和分析网络上可用的信息。对非结构化信息进行的分析之一是极性识别。本文提出了一种在非结构化文本中进行极性识别的方法。具体来说，使用从社交网络Twitter下载的文本。当前社交网络的普及，已经引起了不同用户之间日益产生的信息的极大突出。Twitter在自然语言的自动处理方面给我们提出了很大的挑战，主要是在意见数量非常大，需要自动处理的情况下。在我们的例子中，在tweet中包含极性的确定中。在本文中，我们展示了使用当前最先进的不同机器学习方法获得的结果，例如:支持向量机，朴素贝叶斯，逻辑回归，最近邻和随机森林，这些方法用于两个实现的分类场景:交叉验证和训练和测试集。两个数据集用于评估所实施的方法。支持向量机在两个数据集上都获得了最好的结果，获得的精度值高于83%，可以看到所实现方法的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Revista de Educación Básica

自引率

0.00%

发文量