Message order matters: A robust author profiling approach for social media platforms

Q1 Social Sciences
Mehmet Deniz Türkmen , Mucahid Kutlu
{"title":"Message order matters: A robust author profiling approach for social media platforms","authors":"Mehmet Deniz Türkmen ,&nbsp;Mucahid Kutlu","doi":"10.1016/j.osnem.2025.100316","DOIUrl":null,"url":null,"abstract":"<div><div>As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"47 ","pages":"Article 100316"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696425000175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.
消息顺序很重要:针对社交媒体平台的健壮的作者分析方法
由于句子的顺序会影响文本的含义,转换模型和递归神经网络(RNN)也会考虑标记的顺序。然而,这一功能可能会对社交媒体账户的分类产生负面影响,因为用户可能会连续分享完全不同主题的消息。在这项研究中,我们探讨了如何在社交媒体上提高考虑词序的各种作者分析任务的模型的性能。我们首先注意到变压器模型的输入限制,并提出了一种消息选择方法,该方法也降低了无关消息引起的噪声。此外,我们还展示了任意连接消息可能会有问题。因此,我们建议通过洗牌消息创建多个数据变体,分别对每个变体进行分类,然后汇总预测结果。在我们的综合实验中,我们关注年龄、性别、职业和机器人检测任务。我们表明,提出的内容选择和基于洗牌的方法导致变压器模型在年龄和性别检测任务中的性能略有改善。然而,我们的方法为BiLSTM模型带来了显著的性能提升。此外,我们观察到洗牌方法是增强训练数据的有效手段,进一步提高了模型的性能。此外,我们基于洗牌的方法增强了模型在性别和职业检测任务中对对抗性攻击的抵抗力,而不会影响其在年龄检测中的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Online Social Networks and Media
Online Social Networks and Media Social Sciences-Communication
CiteScore
10.60
自引率
0.00%
发文量
32
审稿时长
44 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信