Message order matters: A robust author profiling approach for social media platforms

IF 2.9 Q1 Social Sciences

Online Social Networks and Media Pub Date : 2025-05-22 DOI:10.1016/j.osnem.2025.100316

Mehmet Deniz Türkmen , Mucahid Kutlu

{"title":"Message order matters: A robust author profiling approach for social media platforms","authors":"Mehmet Deniz Türkmen , Mucahid Kutlu","doi":"10.1016/j.osnem.2025.100316","DOIUrl":null,"url":null,"abstract":"<div><div>As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"47 ","pages":"Article 100316"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696425000175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.

查看原文本刊更多论文

消息顺序很重要：针对社交媒体平台的健壮的作者分析方法

由于句子的顺序会影响文本的含义，转换模型和递归神经网络（RNN）也会考虑标记的顺序。然而，这一功能可能会对社交媒体账户的分类产生负面影响，因为用户可能会连续分享完全不同主题的消息。在这项研究中，我们探讨了如何在社交媒体上提高考虑词序的各种作者分析任务的模型的性能。我们首先注意到变压器模型的输入限制，并提出了一种消息选择方法，该方法也降低了无关消息引起的噪声。此外，我们还展示了任意连接消息可能会有问题。因此，我们建议通过洗牌消息创建多个数据变体，分别对每个变体进行分类，然后汇总预测结果。在我们的综合实验中，我们关注年龄、性别、职业和机器人检测任务。我们表明，提出的内容选择和基于洗牌的方法导致变压器模型在年龄和性别检测任务中的性能略有改善。然而，我们的方法为BiLSTM模型带来了显著的性能提升。此外，我们观察到洗牌方法是增强训练数据的有效手段，进一步提高了模型的性能。此外，我们基于洗牌的方法增强了模型在性别和职业检测任务中对对抗性攻击的抵抗力，而不会影响其在年龄检测中的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊