EPJ Data Science最新文献

Whose voice matters? Word embeddings reveal identity bias in news quotes. 谁的声音重要？词语嵌入揭示了新闻引用中的身份偏见。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-04-17 DOI: 10.1140/epjds/s13688-025-00541-1

Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus

{"title":"Whose voice matters? Word embeddings reveal identity bias in news quotes.","authors":"Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus","doi":"10.1140/epjds/s13688-025-00541-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00541-1","url":null,"abstract":"This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa's apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes - journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00541-1.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"30"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Safe spaces or toxic places? Content moderation and social dynamics of online eating disorder communities. 安全的地方还是有毒的地方？在线饮食失调社区的内容节制和社会动态。

IF 2.5 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-07-25 DOI: 10.1140/epjds/s13688-025-00575-5

Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, Emilio Ferrara

{"title":"Safe spaces or toxic places? Content moderation and social dynamics of online eating disorder communities.","authors":"Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, Emilio Ferrara","doi":"10.1140/epjds/s13688-025-00575-5","DOIUrl":"10.1140/epjds/s13688-025-00575-5","url":null,"abstract":"Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content moderation strategies, from stringent to laissez-faire approaches, we lack a comprehensive understanding of how these different moderation practices interact with user engagement in online communities around these sensitive mental health topics. This study addresses this knowledge gap through a comparative analysis of eating disorder discussions across Twitter/X (2.6M tweets), Reddit (178K submissions), and TikTok (14K videos) spanning from 2019-2023. Our findings reveal that while users across all platforms engage similarly in expressing concerns and seeking support, platforms with weaker moderation (like Twitter/X) enable the formation of toxic echo chambers that amplify pro-anorexia rhetoric. These results demonstrate how moderation strategies significantly influence the development and impact of online communities, particularly in contexts involving mental health and self-harm.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"55"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12296748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144728944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries. COVID-19大流行期间发展中国家流动行为的社会经济差异。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-03-24 DOI: 10.1140/epjds/s13688-025-00532-2

Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger

{"title":"Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries.","authors":"Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger","doi":"10.1140/epjds/s13688-025-00532-2","DOIUrl":"10.1140/epjds/s13688-025-00532-2","url":null,"abstract":"Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"25"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11933202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection of anomalous spatio-temporal patterns of app traffic in response to catastrophic events. 检测响应灾难性事件的应用程序流量的异常时空模式。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-05-06 DOI: 10.1140/epjds/s13688-025-00546-w

Sofia Medina, Shazia'Ayn Babul, Timothy LaRock, Rohit Sahasrabuddhe, Renaud Lambiotte, Nicola Pedreschi

{"title":"Detection of anomalous spatio-temporal patterns of app traffic in response to catastrophic events.","authors":"Sofia Medina, Shazia'Ayn Babul, Timothy LaRock, Rohit Sahasrabuddhe, Renaud Lambiotte, Nicola Pedreschi","doi":"10.1140/epjds/s13688-025-00546-w","DOIUrl":"10.1140/epjds/s13688-025-00546-w","url":null,"abstract":"In this work, we uncover patterns of usage mobile phone applications and information spread in response to perturbations caused by unprecedented events. We focus on categorizing patterns of response in both space and time, tracking their relaxation over time. To this end, we use the NetMob2023 Data Challenge dataset, which provides mobile phone applications traffic volume data for several cities in France at a spatial resolution of 100 <math><msup><mi>m</mi> <mn>2</mn></msup> </math> and a time resolution of 15 minutes for a time period ranging from March to May 2019. We analyze the spread of information before, during, and after the catastrophic Notre-Dame fire on April 15th and a bombing that took place in the city centre of Lyon on May 24th using volume of data uploaded and downloaded to different mobile applications as a proxy of information transfer dynamics. We identify different clusters of information transfer dynamics in response to the Notre-Dame fire within the city of Paris as well as in other major French cities. We find a clear pattern of significantly above-baseline usage of the application Twitter (currently known as X) in Paris that radially spreads from the area surrounding the Notre-Dame cathedral to the rest of the city. We detect a similar pattern in the city of Lyon in response to the bombing. Further, we present a null model of radial information spread and develop methods of tracking radial patterns over time. Overall, we illustrate novel analytical methods we devise, showing how they enable a new perspective on mobile phone user response to unplanned catastrophic events and giving insight into how information spreads during a catastrophe in both time and space.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00546-w.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"35"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055615/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143990977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Milgram's experiment in the knowledge space: individual navigation strategies. 米尔格拉姆在知识空间的实验：个人导航策略。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-06-05 DOI: 10.1140/epjds/s13688-025-00558-6

Manran Zhu, János Kertész

{"title":"Milgram's experiment in the knowledge space: individual navigation strategies.","authors":"Manran Zhu, János Kertész","doi":"10.1140/epjds/s13688-025-00558-6","DOIUrl":"10.1140/epjds/s13688-025-00558-6","url":null,"abstract":"Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing the hierarchical structure of the English Wikipedia and a graph embedding trained on it, we identified two navigation strategies and found that there are significant individual differences in the choices of them. Older, white and female participants tend to adopt a proximity-driven strategy, while younger participants prefer a hub-driven strategy. Our study connects social navigation to knowledge navigation: individuals' differing tendencies to use geographical and occupational information about the target person to navigate in the social space can be understood as different choices between the hub-driven and proximity-driven strategies in the knowledge space.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"42"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144247072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping global value chains at the product level. 在产品层面绘制全球价值链。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-03-12 DOI: 10.1140/epjds/s13688-025-00521-5

Lea Karbevska, César A Hidalgo

{"title":"Mapping global value chains at the product level.","authors":"Lea Karbevska, César A Hidalgo","doi":"10.1140/epjds/s13688-025-00521-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00521-5","url":null,"abstract":"Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the \"World Input-Output Database\", \"Inter-Country Input-Output Tables\", \"EXIOBASE\", and \"EORA\", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"21"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Endogenous labour flow networks. 内生劳动力流动网络。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-05-21 DOI: 10.1140/epjds/s13688-025-00539-9

Kathyrn R Fair, Omar A Guerrero

{"title":"Endogenous labour flow networks.","authors":"Kathyrn R Fair, Omar A Guerrero","doi":"10.1140/epjds/s13688-025-00539-9","DOIUrl":"10.1140/epjds/s13688-025-00539-9","url":null,"abstract":"In the last decade, the study of labour dynamics has led to the introduction of labour flow networks (LFNs) as a way to conceptualise job-to-job transitions, and to the development of mathematical models to explore the dynamics of these networked flows. To date, LFN models have relied upon an assumption of static network structure. However, as recent events (increasing automation in the workplace, the COVID-19 pandemic, a surge in the demand for programming skills, etc.) have shown, we are experiencing drastic shifts in the job landscape that are altering the ways individuals navigate the labour market. Here we develop a novel model that emerges LFNs from agent-level behaviour, removing the necessity of assuming that future job-to-job flows will be along the same paths where they have been historically observed. This model, informed by economic theory and microdata for the United Kingdom, generates empirical LFNs with a high level of accuracy. We use the model to explore how shocks impacting the underlying distributions of jobs and wages alter the topology of the LFN. This framework represents a crucial step towards the development of models that can answer questions about the future of work in an ever-changing world.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00539-9.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"39"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12095427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding stock market instability via graph auto-encoders. 通过图形自动编码器了解股票市场的不稳定性。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-02-19 DOI: 10.1140/epjds/s13688-025-00523-3

Dragos Gorduza, Stefan Zohren, Xiaowen Dong

{"title":"Understanding stock market instability via graph auto-encoders.","authors":"Dragos Gorduza, Stefan Zohren, Xiaowen Dong","doi":"10.1140/epjds/s13688-025-00523-3","DOIUrl":"10.1140/epjds/s13688-025-00523-3","url":null,"abstract":"Understanding stock market instability is a key question in financial management as practitioners seek to forecast breakdowns in long-run asset co-movement patterns which expose portfolios to rapid and devastating collapses in value. These disruptions are linked to changes in the structure of market wide stock correlations which increase the risk of high volatility shocks. The structure of these co-movements can be described as a network where companies are represented by nodes while edges capture correlations between their price movements. Co-movement breakdowns then manifest as abrupt changes in the topological structure of this network. Measuring the scale of this change and learning a timely indicator of breakdowns is central in understanding both financial stability and volatility forecasting. We propose to use the edge reconstruction accuracy of a graph auto-encoder as an indicator for how homogeneous connections between assets are, which we use, based on the literature of financial network analysis, as a proxy to infer market volatility. We show, through our experiments on the Standard and Poor's index over the 2015-2022 period, that the reconstruction errors from our model correlate with volatility spikes and can be used to improve out-of-sample autoregressive modeling of volatility. Our results demonstrate that market instability can be predicted by changes in the homogeneity in connections of the financial network which expands the understanding of instability in the stock market. We discuss the implications of this graph machine learning-based volatility estimation for policy targeted at ensuring financial market stability.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"13"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11839781/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143482451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weakly supervised veracity classification with LLM-predicted credibility signals. 基于llm预测可信度信号的弱监督准确率分类。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-02-21 DOI: 10.1140/epjds/s13688-025-00534-0

João A Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

{"title":"Weakly supervised veracity classification with LLM-predicted credibility signals.","authors":"João A Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton","doi":"10.1140/epjds/s13688-025-00534-0","DOIUrl":"10.1140/epjds/s13688-025-00534-0","url":null,"abstract":"Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity of online content. Automating the extraction of credibility signals presents significant challenges due to the necessity of training high-accuracy, signal-specific extractors, coupled with the lack of sufficiently large annotated datasets. This paper introduces Pastel (Prompted weAk Supervision wiTh crEdibility signaLs), a weakly supervised approach that leverages large language models (LLMs) to extract credibility signals from web content, and subsequently combines them to predict the veracity of content without relying on human supervision. We validate our approach using four article-level misinformation detection datasets, demonstrating that Pastel outperforms zero-shot veracity detection by 38.3% and achieves 86.7% of the performance of the state-of-the-art system trained with human supervision. Moreover, in cross-domain settings where training and testing datasets originate from different domains, Pastel significantly outperforms the state-of-the-art supervised model by 63%. We further study the association between credibility signals and veracity, and perform an ablation study showing the impact of each signal on model performance. Our findings reveal that 12 out of the 19 proposed signals exhibit strong associations with veracity across all datasets, while some signals show domain-specific strengths.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00534-0.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"16"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11845407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143482452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

When dialects collide: how socioeconomic mixing affects language use. 当方言碰撞：社会经济混合如何影响语言使用。

IF 3 2区计算机科学

EPJ Data Science Pub Date : 2025-01-01 Epub Date: 2025-07-10 DOI: 10.1140/epjds/s13688-025-00563-9

Thomas Louf, José J Ramasco, David Sánchez, Márton Karsai

{"title":"When dialects collide: how socioeconomic mixing affects language use.","authors":"Thomas Louf, José J Ramasco, David Sánchez, Márton Karsai","doi":"10.1140/epjds/s13688-025-00563-9","DOIUrl":"10.1140/epjds/s13688-025-00563-9","url":null,"abstract":"The socioeconomic background of people and how they use standard forms of language are not independent, as demonstrated in various sociolinguistic studies. However, the extent to which these correlations may be influenced by the mixing of people from different socioeconomic classes remains relatively unexplored from a quantitative perspective. In this work we leverage geotagged tweets and transferable computational methods to map deviations from standard English across eight UK metropolitan areas. We combine these data with high-resolution income maps to assign a proxy socioeconomic indicator to home-located users. Strikingly, we find a consistent pattern suggesting that the more different socioeconomic classes mix, the less interdependent the frequency of their departures from standard grammar and their income become. Further, we propose an agent-based model of linguistic variety adoption that sheds light on the mechanisms that produce the observations seen in the data.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00563-9.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"47"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245997/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144625631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0