{"title":"Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis","authors":"Nirmalya Thakur","doi":"arxiv-2409.05292","DOIUrl":null,"url":null,"abstract":"The world is currently experiencing an outbreak of mpox, which has been\ndeclared a Public Health Emergency of International Concern by WHO. No prior\nwork related to social media mining has focused on the development of a dataset\nof Instagram posts about the mpox outbreak. The work presented in this paper\naims to address this research gap and makes two scientific contributions to\nthis field. First, it presents a multilingual dataset of 60,127 Instagram posts\nabout mpox, published between July 23, 2022, and September 5, 2024. The\ndataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram\nposts about mpox in 52 languages. For each of these posts, the Post ID, Post\nDescription, Date of publication, language, and translated version of the post\n(translation to English was performed using the Google Translate API) are\npresented as separate attributes in the dataset. After developing this dataset,\nsentiment analysis, hate speech detection, and anxiety or stress detection were\nperformed. This process included classifying each post into (i) one of the\nsentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or\nneutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no\nanxiety/stress detected. These results are presented as separate attributes in\nthe dataset. Second, this paper presents the results of performing sentiment\nanalysis, hate speech analysis, and anxiety or stress analysis. The variation\nof the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and\nneutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and\n50.64%, respectively. In terms of hate speech detection, 95.75% of the posts\ndid not contain hate and the remaining 4.25% of the posts contained hate.\nFinally, 72.05% of the posts did not indicate any anxiety/stress, and the\nremaining 27.95% of the posts represented some form of anxiety/stress.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The world is currently experiencing an outbreak of mpox, which has been
declared a Public Health Emergency of International Concern by WHO. No prior
work related to social media mining has focused on the development of a dataset
of Instagram posts about the mpox outbreak. The work presented in this paper
aims to address this research gap and makes two scientific contributions to
this field. First, it presents a multilingual dataset of 60,127 Instagram posts
about mpox, published between July 23, 2022, and September 5, 2024. The
dataset, available at https://dx.doi.org/10.21227/7fvc-y093, contains Instagram
posts about mpox in 52 languages. For each of these posts, the Post ID, Post
Description, Date of publication, language, and translated version of the post
(translation to English was performed using the Google Translate API) are
presented as separate attributes in the dataset. After developing this dataset,
sentiment analysis, hate speech detection, and anxiety or stress detection were
performed. This process included classifying each post into (i) one of the
sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or
neutral, (ii) hate or not hate, and (iii) anxiety/stress detected or no
anxiety/stress detected. These results are presented as separate attributes in
the dataset. Second, this paper presents the results of performing sentiment
analysis, hate speech analysis, and anxiety or stress analysis. The variation
of the sentiment classes - fear, surprise, joy, sadness, anger, disgust, and
neutral were observed to be 27.95%, 2.57%, 8.69%, 5.94%, 2.69%, 1.53%, and
50.64%, respectively. In terms of hate speech detection, 95.75% of the posts
did not contain hate and the remaining 4.25% of the posts contained hate.
Finally, 72.05% of the posts did not indicate any anxiety/stress, and the
remaining 27.95% of the posts represented some form of anxiety/stress.