{"title":"消息顺序很重要:针对社交媒体平台的健壮的作者分析方法","authors":"Mehmet Deniz Türkmen , Mucahid Kutlu","doi":"10.1016/j.osnem.2025.100316","DOIUrl":null,"url":null,"abstract":"<div><div>As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"47 ","pages":"Article 100316"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Message order matters: A robust author profiling approach for social media platforms\",\"authors\":\"Mehmet Deniz Türkmen , Mucahid Kutlu\",\"doi\":\"10.1016/j.osnem.2025.100316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.</div></div>\",\"PeriodicalId\":52228,\"journal\":{\"name\":\"Online Social Networks and Media\",\"volume\":\"47 \",\"pages\":\"Article 100316\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Online Social Networks and Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468696425000175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696425000175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
Message order matters: A robust author profiling approach for social media platforms
As the order of sentences can impact the meaning of texts, transformer models and recurrent neural networks (RNN) also consider the order of the tokens. However, this feature can negatively affect the classification of social media accounts, as users might share messages on entirely different topics in consecutive order. In this study, we explore how to enhance the performance of models that take into account word order for various author profiling tasks on social media. We first draw attention to the transformer models’ input limit and propose a message selection method that also reduces noise caused by irrelevant messages. In addition, we show that arbitrarily concatenating messages can be problematic. Therefore, we propose creating multiple variants of data by shuffling messages, classifying each variant separately, and then aggregating the predictions. In our comprehensive experiments, we focus on age, gender, occupation, and bot detection tasks. We show that the proposed content selection and shuffling-based methods lead to slight improvements in the transformer model’s performance for age and gender detection tasks. However, our approach yields noticeable performance increases for BiLSTM model. Additionally, we observe that the shuffling method serves as an effective means to augment training data, further enhancing models’ performance. Moreover, our shuffling-based approach enhances the models’ resistance to adversarial attacks in gender and occupation detection tasks without compromising their performance in age detection.