{"title":"DisTGranD: Granular event/sub-event classification for disaster response","authors":"Ademola Adesokan , Sanjay Madria , Long Nguyen","doi":"10.1016/j.osnem.2024.100297","DOIUrl":"10.1016/j.osnem.2024.100297","url":null,"abstract":"<div><div>Efficient crisis management relies on prompt and precise analysis of disaster data from various sources, including social media. The advantage of fine-grained, annotated, class-labeled data is the provision of a diversified range of information compared to high-level label datasets. In this study, we introduce a dataset richly annotated at a low level to more accurately classify crisis-related communication. To this end, we first present DisTGranD, an extensively annotated dataset of over 47,600 tweets related to earthquakes and hurricanes. The dataset uses the Automatic Content Extraction (ACE) standard to provide detailed classification into dual-layer annotation for events and sub-events and identify critical triggers and supporting arguments. The inter-annotator evaluation of DisTGranD demonstrated high agreement among annotators, with Fleiss Kappa scores of 0.90 and 0.93 for event and sub-event types, respectively. Moreover, a transformer-based embedded phrase extraction method showed XLNet achieving an impressive 96% intra-label similarity score for event type and 97% for sub-event type. We further proposed a novel deep learning classification model, RoBiCCus, which achieved <span><math><mrow><mo>≥</mo><mn>90</mn><mtext>%</mtext></mrow></math></span> accuracy and F1-Score in the event and sub-event type classification tasks on our DisTGranD dataset and outperformed other models on publicly available disaster datasets. DisTGranD dataset represents a nuanced class-labeled framework for detecting and classifying disaster-related social media content, which can significantly aid decision-making in disaster response. This robust dataset enables deep-learning models to provide insightful, actionable data during crises. Our annotated dataset and code are publicly available on GitHub <span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100297"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143095218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BD2TSumm: A Benchmark Dataset for Abstractive Disaster Tweet Summarization","authors":"Piyush Kumar Garg , Roshni Chakraborty , Sourav Kumar Dandapat","doi":"10.1016/j.osnem.2024.100299","DOIUrl":"10.1016/j.osnem.2024.100299","url":null,"abstract":"<div><div>Online social media platforms, such as Twitter, are mediums for valuable updates during disasters. However, the large scale of available information makes it difficult for humans to identify relevant information from the available information. An automatic summary of these tweets provides identification of relevant information easy and ensures a holistic overview of a disaster event to process the aid for disaster response. In literature, there are two types of abstractive disaster tweet summarization approaches based on the format of output summary: key-phrased-based (where summary is a set of key-phrases) and sentence-based (where summary is a paragraph consisting of sentences). Existing sentence-based abstractive approaches are either unsupervised or supervised. However, both types of approaches require a sizable amount of ground-truth summaries for training and/or evaluation such that they work on disaster events irrespective of type and location. The lack of abstractive disaster ground-truth summaries and guidelines for annotation motivates us to come up with a systematic procedure to create abstractive sentence ground-truth summaries of disaster events. Therefore, this paper presents a two-step systematic annotation procedure for sentence-based abstractive summary creation. Additionally, we release <em>BD2TSumm</em>, i.e., a benchmark ground-truth dataset for evaluating the sentence-based abstractive summarization approaches for disaster events. <em>BD2TSumm</em> consists of 15 ground-truth summaries belonging to 5 different continents and both natural and man-made disaster types. Furthermore, to ensure the high quality of the generated ground-truth summaries, we evaluate them qualitatively (using five metrics) and quantitatively (using two metrics). Finally, we compare 12 existing State-Of-The-Art (SOTA) abstractive summarization approaches on these ground-truth summaries using ROUGE-N F1-score.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100299"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143095219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Influencer self-disclosure practices on Instagram: A multi-country longitudinal study","authors":"Thales Bertaglia , Catalina Goanta , Gerasimos Spanakis , Adriana Iamnitchi","doi":"10.1016/j.osnem.2024.100298","DOIUrl":"10.1016/j.osnem.2024.100298","url":null,"abstract":"<div><div>This paper presents a longitudinal study of more than ten years of activity on Instagram consisting of over a million posts by 400 content creators from four countries: the US, Brazil, Netherlands and Germany. Our study shows differences in the professionalisation of content monetisation between countries, yet consistent patterns; significant differences in the frequency of posts yet similar user engagement trends; and significant differences in the disclosure of sponsored content in some countries, with a direct connection with national legislation. We analyse shifts in marketing strategies due to legislative and platform feature changes, focusing on how content creators adapt disclosure methods to different legal environments. We also analyse the impact of disclosures and sponsored posts on engagement and conclude that, although sponsored posts have lower engagement on average, properly disclosing ads does not reduce engagement further. Our observations stress the importance of disclosure compliance and can guide authorities in developing and monitoring them more effectively.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100298"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143095211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How political symbols spread in online social networks: Using agent-based models to replicate the complex contagion of the yellow ribbon in Twitter","authors":"Francisco J. León-Medina","doi":"10.1016/j.osnem.2025.100300","DOIUrl":"10.1016/j.osnem.2025.100300","url":null,"abstract":"<div><div>This paper analyzes the diffusion of the yellow ribbon in Twitter, a political symbol that represents the demand for the release of Catalan prisoners. We gathered data on potential users of the symbol in Twitter (users that publicly backed the cause), including their social network of friendships, and built an agent-based simulation to replicate the diffusion of the symbol in a digital twin version of the observed network. Our hypothesis was that complex contagion is the best explanation of the observed statistical relation between the proportion of adopting neighbors and the probability of adoption. Results show that the complex contagion model outperforms the simple contagion model and generates a better fit between the observed and the simulated pattern when the typical conditions of a complex contagion process are added to the baseline model, that is, when agents are affected by their reference group behavior rather than by the most influential nodes of the network, and when we identify a peripherical and densely connected network community and trigger the process from there. These results widen the set of behaviors whose diffusion can be explained as complex contagion to include adoption in low-risk/low-cost behaviors among people who would usually <em>not</em> resist adoption.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100300"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143095301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas G.S. Félix, Washington Cunha, Claudio M.V. de Andrade, Marcos André Gonçalves, Jussara M. Almeida
{"title":"Why are you traveling? Inferring trip profiles from online reviews and domain-knowledge","authors":"Lucas G.S. Félix, Washington Cunha, Claudio M.V. de Andrade, Marcos André Gonçalves, Jussara M. Almeida","doi":"10.1016/j.osnem.2024.100296","DOIUrl":"10.1016/j.osnem.2024.100296","url":null,"abstract":"<div><div>This paper addresses the task of inferring trip profiles (TPs), which consists of determining the profile of travelers engaged in a particular trip given a set of possible categories. TPs may include working trips, leisure journeys with friends, or family vacations. Travelers with different TPs typically have varied plans regarding destinations and timing. TP inference may provide significant insights for numerous tourism-related services, such as geo-recommender systems and tour planning. We focus on TP inference using TripAdvisor, a prominent tourism-centric social media platform, as our data source. Our goal is to evaluate how effectively we can automatically discern the TP from a user review on this platform. A user review encompasses both textual feedback and domain-specific data (such as a user’s previous visits to the location), which are crucial for accurately characterizing the trip. To achieve this, we assess various feature sets (including text and domain-specific) and implement advanced machine learning models, such as neural Transformers and open-source Large Language Models (Llama 2, Bloom). We examine two variants of the TP inference task—binary and multi-class. Surprisingly, our findings reveal that combining domain-specific features with TF-IDF-based representation in an LGBM model performs as well as more complex Transformer and LLM models, while being much more efficient and interpretable.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100296"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143095300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Cantini, Cristian Cosentino, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio
{"title":"Harnessing prompt-based large language models for disaster monitoring and automated reporting from social media feedback","authors":"Riccardo Cantini, Cristian Cosentino, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio","doi":"10.1016/j.osnem.2024.100295","DOIUrl":"10.1016/j.osnem.2024.100295","url":null,"abstract":"<div><div>In recent years, social media has emerged as one of the main platforms for real-time reporting of issues during disasters and catastrophic events. While great strides have been made in collecting such information, there remains an urgent need to improve user reports’ automation, aggregation, and organization to streamline various tasks, including rescue operations, resource allocation, and communication with the press. This paper introduces an innovative methodology that leverages the power of prompt-based Large Language Models (LLMs) to strengthen disaster response and management. By analyzing large volumes of user-generated content, our methodology identifies issues reported by citizens who have experienced a disastrous event, such as damaged buildings, broken gas pipelines, and flooding. It also localizes all posts containing references to geographic information in the text, allowing for aggregation of posts that occurred nearby. By leveraging these localized citizen-reported issues, the methodology generates insightful reports full of essential information for emergency services, news agencies, and other interested parties. Extensive experimentation on large datasets validates the accuracy and efficiency of our methodology in classifying posts, detecting sub-events, and producing real-time reports. These findings highlight the practical value of prompt-based LLMs in disaster response, emphasizing their flexibility and adaptability in delivering timely insights that support more effective interventions.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100295"},"PeriodicalIF":0.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HaRNaT - A dynamic hashtag recommendation system using news","authors":"Divya Gupta, Shampa Chakraverty","doi":"10.1016/j.osnem.2024.100294","DOIUrl":"10.1016/j.osnem.2024.100294","url":null,"abstract":"<div><div>Microblogging platforms such as <em>X</em> and <em>Mastadon</em> have evolved into significant data sources, where the Hashtag Recommendation System (HRS) is being devised to automate the recommendation of hashtags for user queries. We propose a context-sensitive, Machine Learning based HRS named <em>HaRNaT</em>, that strategically leverages news articles to identify pertinent keywords and subjects related to a query. It interprets the fresh context of a query and tracks the evolving dynamics of hashtags to evaluate their relevance in the present context. In contrast to prior methods that primarily rely on microblog content for hashtag recommendation, <em>HaRNaT</em> mines contextually related microblogs and assesses the relevance of co-occurring hashtags with news information. To accomplish this, it evaluates hashtag features, including pertinence, popularity among users, and association with other hashtags. In performance evaluation of <em>HaRNaT</em> trained on these features demonstrates a macro-averaged precision of 84% with Naive Bayes and 80% with Logistic Regression. Compared to <em>Hashtagify</em>- a hashtag search engine, <em>HaRNaT</em> offers a dynamically evolving set of hashtags.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100294"},"PeriodicalIF":0.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How does user-generated content on Social Media affect stock predictions? A case study on GameStop","authors":"Antonino Ferraro , Giancarlo Sperlì","doi":"10.1016/j.osnem.2024.100293","DOIUrl":"10.1016/j.osnem.2024.100293","url":null,"abstract":"<div><div>One of the main challenges in the financial market concerns the forecasting of stock behavior, which plays a key role in supporting the financial decisions of investors. In recent years, the large amount of available financial data and the heterogeneous contextual information led researchers to investigate data-driven models using Artificial Intelligence (AI)-based approaches for forecasting stock prices. Recent methodologies focus mainly on analyzing participants from Reddit without considering other social media and how their combination affects the stock market, which remains an open challenge. In this paper, we combine financial data and textual user-generated information, which are provided as input to various deep learning models, to develop a stock forecasting system. The main novelties of the proposal concern the design of a multi-modal approach combining historical stock prices and sentiment scores extracted by different Online Social Networks (OSNs), also unveiling possible correlations about heterogeneous information evaluated during the GameStop squeeze. In particular, we have examined several AI-based models and investigated the impact of textual data inferred from well-known Online Social Networks (<em>i.e.</em>, Reddit and Twitter) on stock market behavior by conducting a case study on GameStop. Although users’ dynamic opinions on social networks may have a detrimental impact on the stock prediction task, our investigation has demonstrated the usefulness of assessing user-generated content inferred from various OSNs on the market forecasting problem.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"43 ","pages":"Article 100293"},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142653576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Milo Z. Trujillo, Laurent Hébert-Dufresne, James Bagrow
{"title":"Measuring centralization of online platforms through size and interconnection of communities","authors":"Milo Z. Trujillo, Laurent Hébert-Dufresne, James Bagrow","doi":"10.1016/j.osnem.2024.100292","DOIUrl":"10.1016/j.osnem.2024.100292","url":null,"abstract":"<div><div>Decentralization of online social platforms offers a variety of potential benefits, including divesting of moderator and administrator authority among a wider population, allowing a variety of communities with differing social standards to coexist, and making the platform more resilient to technical or social attack. However, a platform offering a decentralized architecture does not guarantee that users will use it in a decentralized way, and measuring the centralization of socio-technical networks is not an easy task. In this paper we introduce a method of characterizing inter-community influence, to measure the impact that removing a community would have on the remainder of a platform. Our approach provides a careful definition of “centralization” appropriate in bipartite user-community socio-technical networks, and demonstrates the inadequacy of more trivial methods for interrogating centralization such as examining the distribution of community sizes. We use this method to compare the structure of five socio-technical platforms, and find that even decentralized platforms like Mastodon are far more centralized than any synthetic networks used for comparison. We discuss how this method can be used to identify when a platform is more centralized than it initially appears, either through inherent social pressure like assortative preferential attachment, or through astroturfing by platform administrators, and how this knowledge can inform platform governance and user trust.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"43 ","pages":"Article 100292"},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giulio Corsi , Elizabeth Seger , Sean Ó hÉigeartaigh
{"title":"Crowdsourcing the Mitigation of disinformation and misinformation: The case of spontaneous community-based moderation on Reddit","authors":"Giulio Corsi , Elizabeth Seger , Sean Ó hÉigeartaigh","doi":"10.1016/j.osnem.2024.100291","DOIUrl":"10.1016/j.osnem.2024.100291","url":null,"abstract":"<div><div>Community-based content moderation, an approach that utilises user-generated knowledge to shape the ranking and display of online content, is recognised as a potential tool in combating disinformation and misinformation. This study examines this phenomenon on Reddit, which employs a platform-wide content ranking system based on user upvotes and downvotes. By empowering users to influence content visibility, Reddit's system serves as a naturally occurring community moderation mechanism, providing an opportunity to analyse how users engage with this system. Focusing on discussions related to climate change, we observe that in this domain, low-credibility content is spontaneously moderated by Reddit users, although the magnitude of this effect varies across Subreddits. We also identify temporal fluctuations in content removal rates, indicating dynamic and context-dependent patterns influenced by platform policies and socio-political factors. These findings highlight the potential of community-based moderation in mitigating online false information, offering valuable insights for the development of robust social media moderation frameworks.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"43 ","pages":"Article 100291"},"PeriodicalIF":0.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}