EPJ Data SciencePub Date : 2024-07-25DOI: 10.1140/epjds/s13688-024-00490-1
Marco Mancastroppa, Iacopo Iacopini, Giovanni Petri, Alain Barrat
{"title":"The structural evolution of temporal hypergraphs through the lens of hyper-cores","authors":"Marco Mancastroppa, Iacopo Iacopini, Giovanni Petri, Alain Barrat","doi":"10.1140/epjds/s13688-024-00490-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00490-1","url":null,"abstract":"<p>The richness of many complex systems stems from the interactions among their components. The higher-order nature of these interactions, involving many units at once, and their temporal dynamics constitute crucial properties that shape the behaviour of the system itself. An adequate description of these systems is offered by temporal hypergraphs, that integrate these features within the same framework. However, tools for their temporal and topological characterization are still scarce. Here we develop a series of methods specifically designed to analyse the structural properties of temporal hypergraphs at multiple scales. Leveraging the hyper-core decomposition of hypergraphs, we follow the evolution of the hyper-cores through time, characterizing the hypergraph structure and its temporal dynamics at different topological scales, and quantifying the multi-scale structural stability of the system. We also define two static hypercoreness centrality measures that provide an overall description of the nodes aggregated structural behaviour. We apply the characterization methods to several data sets, establishing connections between structural properties and specific activities within the systems. Finally, we show how the proposed method can be used as a model-validation tool for synthetic temporal hypergraphs, distinguishing the higher-order structures and dynamics generated by different models from the empirical ones, and thus identifying the essential model mechanisms to reproduce the empirical hypergraph structure and evolution. Our work opens several research directions, from the understanding of dynamic processes on temporal higher-order networks to the design of new models of time-varying hypergraphs.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"16 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-07-17DOI: 10.1140/epjds/s13688-024-00489-8
Nandini Iyer, Ronaldo Menezes, Hugo Barbosa
{"title":"The role of transport systems in housing insecurity: a mobility-based analysis","authors":"Nandini Iyer, Ronaldo Menezes, Hugo Barbosa","doi":"10.1140/epjds/s13688-024-00489-8","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00489-8","url":null,"abstract":"<p>With trends of urbanisation on the rise, providing adequate housing to individuals remains a complex issue to be addressed. Often, the slow output of relevant housing policies, coupled with quickly increasing housing costs, leaves individuals with the burden of finding housing that is affordable and in a safe location. In this paper, we unveil how transit service to employment hubs, not just housing policies, can prevent individuals from improving their housing conditions. We approach this question in three steps, applying the workflow to 20 cities in the United States of America. First, we propose a comprehensive framework to quantify housing insecurity and assign a housing demographic to each neighbourhood. Second, we use transit-pedestrian networks and public transit timetables (GTFS feeds) to estimate the time it takes to travel between two neighbourhoods using public transportation. Third, we apply geospatial autocorrelation to identify employment hotspots for each housing demographic. Finally, we use stochastic modelling to highlight how commuting to areas associated with better housing conditions results in transit commute times of over an hour in 15 cities. Ultimately, we consider the compounded burdens that come with housing insecurity, by having poor transit access to employment areas. In doing so, we highlight the importance of understanding how negative outcomes of housing insecurity coincide with various urban mechanisms, particularly emphasising the role that public transportation plays in locking vulnerable demographics into a cycle of poverty.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"26 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system","authors":"Jordi Grau-Escolano, Aleix Bassolas, Julian Vicens","doi":"10.1140/epjds/s13688-024-00486-x","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00486-x","url":null,"abstract":"<p>Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona’s bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system’s predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141610702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-07-10DOI: 10.1140/epjds/s13688-024-00488-9
Alexander M. Petersen
{"title":"Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation","authors":"Alexander M. Petersen","doi":"10.1140/epjds/s13688-024-00488-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00488-9","url":null,"abstract":"<p>We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., off-market real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"54 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141588230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-07-05DOI: 10.1140/epjds/s13688-024-00487-w
Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu
{"title":"Downscaling spatial interaction with socioeconomic attributes","authors":"Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu","doi":"10.1140/epjds/s13688-024-00487-w","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00487-w","url":null,"abstract":"<p>A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-06-28DOI: 10.1140/epjds/s13688-024-00483-0
Minje Choi, Daniel M. Romero, David Jurgens
{"title":"Profile update: the effects of identity disclosure on network connections and language","authors":"Minje Choi, Daniel M. Romero, David Jurgens","doi":"10.1140/epjds/s13688-024-00483-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00483-0","url":null,"abstract":"<p>Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one’s identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing user reactions using relevance between location information of tweets and news articles","authors":"Yun-Tae Jin, JaeBeom You, Shoko Wakamiya, Hyuk-Yoon Kwon","doi":"10.1140/epjds/s13688-024-00465-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00465-2","url":null,"abstract":"<p>In this study, we analyze the extent of user reactions based on user’s tweets to news articles, demonstrating the potential for home location prediction. To achieve this, we quantify users’ reactions to specific news articles based on the textual similarity between tweets and news articles, showcasing that users’ reactions to news articles about their cities are significantly higher than those about other cities. To maximize the difference in reactions, we introduce the concept of <i>News Distinctness</i>, which highlights the news articles that affect a specific location. By incorporating News Distinctness with users’ reactions to the news, we magnify its effects. Through experiments conducted with tweets collected from users whose home locations are in five representative cities within the United States and news articles describing events occurring in those cities, we observed a 6.75% to 40% improvement in the reaction score when compared to the average reactions towards news for outside of home location, clearly predicting the home location. Furthermore, News Distinctness increases the difference in reaction score between news in the home location and the average of the news outside of the home location by 12% to 194%. These results demonstrate that our proposed idea can be utilized to predict the users’ location, potentially recommending meaningful information based on the users’ areas of interest.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"10 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-06-07DOI: 10.1140/epjds/s13688-024-00481-2
Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano
{"title":"Glitter or gold? Deriving structured insights from sustainability reports via large language models","authors":"Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano","doi":"10.1140/epjds/s13688-024-00481-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00481-2","url":null,"abstract":"<p>Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors’ increasing attention to Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a challenge in efficiently gathering and aligning the data into a unified framework to derive insights related to Corporate Social Responsibility (CSR). Thus, using Information Extraction (IE) methods becomes an intuitive choice for delivering insightful and actionable data to stakeholders. In this study, we employ Large Language Models (LLMs), In-Context Learning, and the Retrieval-Augmented Generation (RAG) paradigm to extract structured insights related to ESG aspects from companies’ sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights. These analyses revealed that ESG criteria cover a wide range of topics, exceeding 500, often beyond those considered in existing categorizations, and are addressed by companies through a variety of initiatives. Moreover, disclosure similarities emerged among companies from the same region or sector, validating ongoing hypotheses in the ESG literature. Lastly, by incorporating additional company attributes into our analyses, we investigated which factors impact the most on companies’ ESG ratings, showing that ESG disclosure affects the obtained ratings more than other financial or company data.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"64 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-06-05DOI: 10.1140/epjds/s13688-024-00480-3
Pau Muñoz, Alejandro Bellogín, Raúl Barba-Rojas, Fernando Díez
{"title":"Quantifying polarization in online political discourse","authors":"Pau Muñoz, Alejandro Bellogín, Raúl Barba-Rojas, Fernando Díez","doi":"10.1140/epjds/s13688-024-00480-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00480-3","url":null,"abstract":"<p>In an era of increasing political polarization, its analysis becomes crucial for the understanding of democratic dynamics. This paper presents a comprehensive research on measuring political polarization on X (Twitter) during election cycles in Spain, from 2011 to 2019. A wide comparative analysis is performed on algorithms used to identify and measure polarization or controversy on microblogging platforms. This analysis is specifically tailored towards publications made by official political party accounts during pre-campaign, campaign, election day, and the week post-election. Guided by the findings of this comparative evaluation, we propose a novel algorithm better suited to capture polarization in the context of political events, which is validated with real data. As a consequence, our research contributes a significant advancement in the field of political science, social network analysis, and overall computational social science, by providing a realistic method to capture polarization from online political discourse.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"69 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-05-17DOI: 10.1140/epjds/s13688-024-00476-z
Oleg Sobchuk, Mason Youngblood, Olivier Morin
{"title":"First-mover advantage in music","authors":"Oleg Sobchuk, Mason Youngblood, Olivier Morin","doi":"10.1140/epjds/s13688-024-00476-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00476-z","url":null,"abstract":"<p>Why do some songs and musicians become successful while others do not? We show that one of the reasons may be the “first-mover advantage”: artists that stand at the foundation of new music genres tend to be more successful than those who join these genres later on. To test this hypothesis, we have analyzed a massive dataset of over 920,000 songs, including 110 music genres: 10 chosen intentionally and preregistered, and 100 chosen randomly. For this, we collected the data from two music services: Spotify, which provides detailed information about songs’ success (the precise number of times each song was listened to), and Every Noise at Once, which provides detailed genre tags for musicians. 91 genres, out of 110, show the first-mover advantage—clearly suggesting that it is an important mechanism in music success and evolution.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}