Supervised Sentiment Analysis of Science Topics: Developing a Training Set of Tweets in Spanish

Patricia Sánchez-Holgado, C. A. Calderón
{"title":"Supervised Sentiment Analysis of Science Topics: Developing a Training Set of Tweets in Spanish","authors":"Patricia Sánchez-Holgado, C. A. Calderón","doi":"10.4018/jitr.2020070105","DOIUrl":null,"url":null,"abstract":"Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.","PeriodicalId":296080,"journal":{"name":"J. Inf. Technol. Res.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Technol. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/jitr.2020070105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.
科学主题的监督情感分析:开发西班牙语推文训练集
Twitter是互联网上最大的实时信息来源之一,世界各地数以百万计的用户不断为其提供信息。这些用户中的每个人都会发布文本消息,其中包含他们的观点、关注点、信息,或者仅仅是他们的日常事件。对网络中的海量数据进行分析是一项挑战,就像从社会和市场的知识角度寻找理解数据所能提供的一切的方法是一项目标一样。科学传播领域仍在探索web2.0和社交网络所能提供的一切,以接触到所有的受众。本文利用机器学习技术开发了一个在Twitter上发布的关于西班牙语科学主题的信息分类模型。这类模型的训练需要为科学主题创建一个特定的西班牙语语料库,这是最费力的任务之一。该分类器能够实时预测Twitter上消息的情绪,置信区间大于80%。其评价结果的准确率为72%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信