R. A. A. Jonker, Roshan Poudel, Olga Fajarda, Sérgio Matos, J. L. Oliveira, Rui Pedro Lopes
{"title":"Portuguese Twitter Dataset on COVID-19","authors":"R. A. A. Jonker, Roshan Poudel, Olga Fajarda, Sérgio Matos, J. L. Oliveira, Rui Pedro Lopes","doi":"10.1109/ASONAM55673.2022.10068592","DOIUrl":null,"url":null,"abstract":"Over the last two years, the COVID-19 pandemic has affected hundreds of millions of people around the world. As in many crises, people turn to social media platforms, like Twitter, to communicate and share information. Twitter datasets have been used over the years in many research studies to extract valuable information. Therefore, several large COVID-19 Twitter datasets have been released over the last two years. However, none of these datasets contains only Portuguese Tweets, despite the Portuguese Language being reported as one of the top five languages used on Twitter. In this paper, we present the first large-scale Portuguese COVID-19 Twitter dataset. The dataset contains over 19 million Tweets spanning 2020 and 2021, allowing the entire pandemic to be analyzed. We also conducted a sentiment analysis on the dataset and correlated the various spikes in Tweet count and sentiment scores to various news articles and government announcements in Portugal and Brazil. The dataset is available at: https://github.com/bioinformatics-ua/Portuguese-Covid19-Dataset","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Over the last two years, the COVID-19 pandemic has affected hundreds of millions of people around the world. As in many crises, people turn to social media platforms, like Twitter, to communicate and share information. Twitter datasets have been used over the years in many research studies to extract valuable information. Therefore, several large COVID-19 Twitter datasets have been released over the last two years. However, none of these datasets contains only Portuguese Tweets, despite the Portuguese Language being reported as one of the top five languages used on Twitter. In this paper, we present the first large-scale Portuguese COVID-19 Twitter dataset. The dataset contains over 19 million Tweets spanning 2020 and 2021, allowing the entire pandemic to be analyzed. We also conducted a sentiment analysis on the dataset and correlated the various spikes in Tweet count and sentiment scores to various news articles and government announcements in Portugal and Brazil. The dataset is available at: https://github.com/bioinformatics-ua/Portuguese-Covid19-Dataset