{"title":"科学主题的监督情感分析:开发西班牙语推文训练集","authors":"Patricia Sánchez-Holgado, C. A. Calderón","doi":"10.4018/jitr.2020070105","DOIUrl":null,"url":null,"abstract":"Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.","PeriodicalId":296080,"journal":{"name":"J. Inf. Technol. Res.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Supervised Sentiment Analysis of Science Topics: Developing a Training Set of Tweets in Spanish\",\"authors\":\"Patricia Sánchez-Holgado, C. A. Calderón\",\"doi\":\"10.4018/jitr.2020070105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.\",\"PeriodicalId\":296080,\"journal\":{\"name\":\"J. Inf. Technol. Res.\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Inf. Technol. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/jitr.2020070105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Technol. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/jitr.2020070105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Supervised Sentiment Analysis of Science Topics: Developing a Training Set of Tweets in Spanish
Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.