EPJ Data SciencePub Date : 2025-01-01Epub Date: 2025-04-17DOI: 10.1140/epjds/s13688-025-00541-1
Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus
{"title":"Whose voice matters? Word embeddings reveal identity bias in news quotes.","authors":"Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus","doi":"10.1140/epjds/s13688-025-00541-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00541-1","url":null,"abstract":"<p><p>This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa's apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes - journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00541-1.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"30"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Safe spaces or toxic places? Content moderation and social dynamics of online eating disorder communities.","authors":"Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, Emilio Ferrara","doi":"10.1140/epjds/s13688-025-00575-5","DOIUrl":"10.1140/epjds/s13688-025-00575-5","url":null,"abstract":"<p><p>Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content moderation strategies, from stringent to laissez-faire approaches, we lack a comprehensive understanding of how these different moderation practices interact with user engagement in online communities around these sensitive mental health topics. This study addresses this knowledge gap through a comparative analysis of eating disorder discussions across Twitter/X (2.6M tweets), Reddit (178K submissions), and TikTok (14K videos) spanning from 2019-2023. Our findings reveal that while users across all platforms engage similarly in expressing concerns and seeking support, platforms with weaker moderation (like Twitter/X) enable the formation of toxic echo chambers that amplify pro-anorexia rhetoric. These results demonstrate how moderation strategies significantly influence the development and impact of online communities, particularly in contexts involving mental health and self-harm.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"55"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12296748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144728944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2025-01-01Epub Date: 2025-03-12DOI: 10.1140/epjds/s13688-025-00521-5
Lea Karbevska, César A Hidalgo
{"title":"Mapping global value chains at the product level.","authors":"Lea Karbevska, César A Hidalgo","doi":"10.1140/epjds/s13688-025-00521-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00521-5","url":null,"abstract":"<p><p>Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the \"World Input-Output Database\", \"Inter-Country Input-Output Tables\", \"EXIOBASE\", and \"EORA\", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"21"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2025-01-01Epub Date: 2025-05-21DOI: 10.1140/epjds/s13688-025-00539-9
Kathyrn R Fair, Omar A Guerrero
{"title":"Endogenous labour flow networks.","authors":"Kathyrn R Fair, Omar A Guerrero","doi":"10.1140/epjds/s13688-025-00539-9","DOIUrl":"10.1140/epjds/s13688-025-00539-9","url":null,"abstract":"<p><p>In the last decade, the study of labour dynamics has led to the introduction of labour flow networks (LFNs) as a way to conceptualise job-to-job transitions, and to the development of mathematical models to explore the dynamics of these networked flows. To date, LFN models have relied upon an assumption of static network structure. However, as recent events (increasing automation in the workplace, the COVID-19 pandemic, a surge in the demand for programming skills, etc.) have shown, we are experiencing drastic shifts in the job landscape that are altering the ways individuals navigate the labour market. Here we develop a novel model that emerges LFNs from agent-level behaviour, removing the necessity of assuming that future job-to-job flows will be along the same paths where they have been historically observed. This model, informed by economic theory and microdata for the United Kingdom, generates empirical LFNs with a high level of accuracy. We use the model to explore how shocks impacting the underlying distributions of jobs and wages alter the topology of the LFN. This framework represents a crucial step towards the development of models that can answer questions about the future of work in an ever-changing world.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00539-9.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"39"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12095427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2025-01-01Epub Date: 2025-05-23DOI: 10.1140/epjds/s13688-025-00556-8
Liam Burke-Moore, Angus R Williams, Jonathan Bright
{"title":"Journalists are most likely to receive abuse: analysing online abuse of UK public figures across sport, politics, and journalism on Twitter.","authors":"Liam Burke-Moore, Angus R Williams, Jonathan Bright","doi":"10.1140/epjds/s13688-025-00556-8","DOIUrl":"10.1140/epjds/s13688-025-00556-8","url":null,"abstract":"<p><p>Engaging with online social media platforms is an important part of life as a public figure in modern society, enabling connection with broad audiences and providing a platform for spreading ideas. However, public figures are often disproportionate recipients of hate and abuse on these platforms, degrading public discourse. While significant research on abuse received by groups such as politicians and journalists exists, little has been done to understand the differences in the dynamics of abuse across different groups of public figures, systematically and at scale. To address this, we present analysis of a novel dataset of 45.5M tweets targeted at 4602 UK public figures across 3 domains (members of parliament, footballers, journalists), labelled using fine-tuned transformer-based language models. We find that MPs receive more abuse in absolute terms, but that journalists are most likely to receive abuse after controlling for other factors. We show that abuse is unevenly distributed in all groups, with a small number of individuals receiving the majority of abuse, and that for some groups, abuse is more temporally uneven, being driven by specific events, particularly for footballers. We also find that a more prominent online presence and being male are indicative of higher levels of abuse across all 3 domains.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"41"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2025-01-01Epub Date: 2025-08-27DOI: 10.1140/epjds/s13688-025-00582-6
Ghenai Amira, Nath Keshav, Satsangi Aarat
{"title":"AGECovP: identifying ageism and analyzing COVID-19 discourse on older adults in YouTube.","authors":"Ghenai Amira, Nath Keshav, Satsangi Aarat","doi":"10.1140/epjds/s13688-025-00582-6","DOIUrl":"10.1140/epjds/s13688-025-00582-6","url":null,"abstract":"<p><p>The COVID-19 pandemic significantly impacted older adults, generating widespread online discussions that revealed how this at-risk population was perceived. Understanding these portrayals is essential, as public discourse influences societal perceptions of aging and impacts policies and practices affecting older adults. Past research highlights that ageist stereotypes and attitudes frequently surface in public discussions, shaping the experiences of older individuals. The current study presents AGECovP, a comprehensive dataset featuring a diverse collection of YouTube videos, a leading social media platform. AGECovP is designed to provide researchers with meaningful insights into how older adults were portrayed during the pandemic and how topics such as conspiracy theories, misinformation, and the anti-vaccine movement were framed in relation to aging populations. In addition, the dataset includes a set of labeled comments indicating the presence of ageist content, enabling researchers to perform ageist detection and analyze ageism in online discourse. By providing a resource for examining both overt and subtle forms of ageism, AGECovP contributes to the development of tools and methodologies for addressing bias against older adults. This dataset fosters actionable insights into societal attitudes, enhancing the development of inclusive policies and interventions. Our data is available at: https://zenodo.org/records/15800324.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"65"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12390874/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144947648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating work engagement from online chat tools","authors":"Hiroaki Tanaka, Wataru Yamada, Keiichi Ochiai, Shoko Wakamiya, Eiji Aramaki","doi":"10.1140/epjds/s13688-024-00496-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00496-9","url":null,"abstract":"<p>The Covid-19 pandemic, caused by the SARS-Cov2- virus, has transformed our lives. To combat the spread of the infection, remote work has become a widespread practice. However, this shift has led to various work-related problems, including prolonged working hours, mental health issues, and communication difficulties. One particular challenge faced by team members is the inability to accurately gauge the work engagement (WE) levels of subordinates, such as their absorption, dedication, and vigor, due to the limited number of in-person interactions that occur in remote work settings. To address this issue, online communication systems utilizing text-based chat tools such as Slack and Microsoft Teams have gained popularity as substitutes for face-to-face communication. In this paper, we propose a novel approach that uses graph neural networks (GNNs) to estimate the work engagement levels (WELs) of users on text-based chat platforms. Specifically, our method involves embedding users in a feature space based solely on the structural information of the utilized communication network, without considering the contents of the conversations that take place. We conduct two studies using Slack data to evaluate our proposal. The first study reveals that the properties of communication networks play a more significant role when estimating WELs than do conversation contents. Building upon this result, the second study involves the development of a machine learning model that estimates WELs using only the architectural features of the employed communication network. In this network representation, each node corresponds to a human user, and edges represent communication logs; i.e., if person A talks to person B, the edge between node A and node B is stretched. Notably, our model achieves a correlation coefficient of 0.60 between the observed and predicted WEL values. Importantly, our proposed approach relies solely on communication network data and does not require linguistic information. This makes it particularly valuable for real-world business situations.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-08-08DOI: 10.1140/epjds/s13688-024-00493-y
Sofía M. del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A. Makse, Pablo Balenzuela
{"title":"Analyzing user ideologies and shared news during the 2019 argentinian elections","authors":"Sofía M. del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A. Makse, Pablo Balenzuela","doi":"10.1140/epjds/s13688-024-00493-y","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00493-y","url":null,"abstract":"<p>The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users’ political ideologies and the news they share during Argentina’s 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"57 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-08-05DOI: 10.1140/epjds/s13688-024-00492-z
Rohit Ram, Marian-Andrei Rizoiu
{"title":"Empirically measuring online social influence","authors":"Rohit Ram, Marian-Andrei Rizoiu","doi":"10.1140/epjds/s13688-024-00492-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00492-z","url":null,"abstract":"<p>Social influence pervades our everyday lives and lays the foundation for complex social phenomena, such as the spread of misinformation and the polarization of communities. A disconnect appears between psychology approaches, generally performed and tested in controlled lab experiments, and quantitative methods, which are usually data-driven and rely on network and event analysis. The former are slow, expensive to deploy, and typically do not generalize well to topical issues; the latter often oversimplify the complexities of social influence and ignore psychosocial literature. This work bridges this gap by introducing a human-in-the-loop active learning method that empirically quantifies social influence by crowdsourcing pairwise influence comparisons. We develop simulation and fitting tools, allowing us to estimate the required budget based on the design features and the worker’s decision accuracy. We perform a series of pilot studies to quantify the impact of design features on worker accuracy. We deploy our method to estimate the influence ranking of 500 X/Twitter users. We validate our measure by showing that the obtained empirical influence is tightly linked with agency and communion, the Big Two of social cognition, with agency being the most important dimension for influence formation.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-08-01DOI: 10.1140/epjds/s13688-024-00484-z
Ambra Amico, Giacomo Vaccario, Frank Schweitzer
{"title":"Efficiency and resilience: key drivers of distribution network growth","authors":"Ambra Amico, Giacomo Vaccario, Frank Schweitzer","doi":"10.1140/epjds/s13688-024-00484-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00484-z","url":null,"abstract":"<p>Networks to distribute goods, from raw materials to food and medicines, are the backbone of a functioning economy. They are shaped by several supply relations connecting manufacturers, distributors, and final buyers worldwide. We present a network-based model to describe the mechanisms underlying the emergence and growth of distribution networks. In our model, firms consider two practices when establishing new supply relations: centralization, the tendency to choose highly connected partners, and multi-sourcing, the preference for multiple suppliers. Centralization enhances network efficiency by leveraging short distribution paths; multi-sourcing fosters resilience by providing multiple distribution paths connecting final buyers to the manufacturer. We validate the proposed model using data on drug shipments in the US. Drawing on these data, we reconstruct 22 nationwide pharmaceutical distribution networks. We demonstrate that the proposed model successfully replicates several structural features of the empirical networks, including their out-degree and path length distributions as well as their resilience and efficiency properties. These findings suggest that the proposed firm-level practices effectively capture the network growth process that leads to the observed structures.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}