Machine Learning for Social Sciences: Stance Classification of User Messages on a Migrant-Critical Discussion Forum

2021 Swedish Workshop on Data Science (SweDS) Pub Date : 2021-12-02 DOI:10.1109/SweDS53855.2021.9637718

Victoria Yantseva, K. Kucher

{"title":"Machine Learning for Social Sciences: Stance Classification of User Messages on a Migrant-Critical Discussion Forum","authors":"Victoria Yantseva, K. Kucher","doi":"10.1109/SweDS53855.2021.9637718","DOIUrl":null,"url":null,"abstract":"In this paper, we present our methodology for supervised stance classification of sparse and imbalanced social media data. We test our framework on a manually labeled dataset of 5700 messages about immigration in the Swedish language posted on the Flashback forum, a controversial online discussion platform. Our proposed approach currently achieves a macro- averaged F1-score of 0.72 for test data on a two-class problem compared against 0.27 for a baseline four-class model. Since effective classification of imbalanced and sparse textual data in under-resourced languages presents certain methodological challenges, our study contributes to a discussion on the best pathways to achieve highest model performance given the character of the data and unavailability of large training datasets for this task. Moreover, this work exemplifies the application of ML methodology to social media data, which can be particularly relevant for social scientists working in this area and interested in leveraging the possibilities of machine learning in their research field. This methodology and the obtained results provide a foundation for further in-depth analyses of social media texts in the Swedish language following a data-driven approach.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Swedish Workshop on Data Science (SweDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SweDS53855.2021.9637718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this paper, we present our methodology for supervised stance classification of sparse and imbalanced social media data. We test our framework on a manually labeled dataset of 5700 messages about immigration in the Swedish language posted on the Flashback forum, a controversial online discussion platform. Our proposed approach currently achieves a macro- averaged F1-score of 0.72 for test data on a two-class problem compared against 0.27 for a baseline four-class model. Since effective classification of imbalanced and sparse textual data in under-resourced languages presents certain methodological challenges, our study contributes to a discussion on the best pathways to achieve highest model performance given the character of the data and unavailability of large training datasets for this task. Moreover, this work exemplifies the application of ML methodology to social media data, which can be particularly relevant for social scientists working in this area and interested in leveraging the possibilities of machine learning in their research field. This methodology and the obtained results provide a foundation for further in-depth analyses of social media texts in the Swedish language following a data-driven approach.

查看原文本刊更多论文

社会科学的机器学习:移民关键论坛上用户信息的立场分类

在本文中，我们提出了对稀疏和不平衡的社交媒体数据进行监督立场分类的方法。我们在一个人工标记的数据集上测试了我们的框架，该数据集包含5700条瑞典语的移民信息，这些信息发布在Flashback论坛(一个有争议的在线讨论平台)上。我们提出的方法目前在两类问题上的测试数据的宏观平均f1得分为0.72，而基线四类模型的得分为0.27。由于在资源不足的语言中对不平衡和稀疏的文本数据进行有效分类提出了一定的方法挑战，因此我们的研究有助于讨论在数据特征和大型训练数据集不可用的情况下实现最高模型性能的最佳途径。此外，这项工作举例说明了机器学习方法在社交媒体数据中的应用，这对于在该领域工作并有兴趣在其研究领域利用机器学习的可能性的社会科学家来说尤其重要。这种方法和获得的结果为进一步深入分析瑞典语社交媒体文本提供了数据驱动方法的基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Swedish Workshop on Data Science (SweDS)

自引率

0.00%

发文量