{"title":"Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system","authors":"Jordi Grau-Escolano, Aleix Bassolas, Julian Vicens","doi":"10.1140/epjds/s13688-024-00486-x","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00486-x","url":null,"abstract":"<p>Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona’s bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system’s predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141610702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-07-10DOI: 10.1140/epjds/s13688-024-00488-9
Alexander M. Petersen
{"title":"Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation","authors":"Alexander M. Petersen","doi":"10.1140/epjds/s13688-024-00488-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00488-9","url":null,"abstract":"<p>We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., off-market real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"54 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141588230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-07-05DOI: 10.1140/epjds/s13688-024-00487-w
Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu
{"title":"Downscaling spatial interaction with socioeconomic attributes","authors":"Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu","doi":"10.1140/epjds/s13688-024-00487-w","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00487-w","url":null,"abstract":"<p>A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-06-28DOI: 10.1140/epjds/s13688-024-00483-0
Minje Choi, Daniel M. Romero, David Jurgens
{"title":"Profile update: the effects of identity disclosure on network connections and language","authors":"Minje Choi, Daniel M. Romero, David Jurgens","doi":"10.1140/epjds/s13688-024-00483-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00483-0","url":null,"abstract":"<p>Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one’s identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing user reactions using relevance between location information of tweets and news articles","authors":"Yun-Tae Jin, JaeBeom You, Shoko Wakamiya, Hyuk-Yoon Kwon","doi":"10.1140/epjds/s13688-024-00465-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00465-2","url":null,"abstract":"<p>In this study, we analyze the extent of user reactions based on user’s tweets to news articles, demonstrating the potential for home location prediction. To achieve this, we quantify users’ reactions to specific news articles based on the textual similarity between tweets and news articles, showcasing that users’ reactions to news articles about their cities are significantly higher than those about other cities. To maximize the difference in reactions, we introduce the concept of <i>News Distinctness</i>, which highlights the news articles that affect a specific location. By incorporating News Distinctness with users’ reactions to the news, we magnify its effects. Through experiments conducted with tweets collected from users whose home locations are in five representative cities within the United States and news articles describing events occurring in those cities, we observed a 6.75% to 40% improvement in the reaction score when compared to the average reactions towards news for outside of home location, clearly predicting the home location. Furthermore, News Distinctness increases the difference in reaction score between news in the home location and the average of the news outside of the home location by 12% to 194%. These results demonstrate that our proposed idea can be utilized to predict the users’ location, potentially recommending meaningful information based on the users’ areas of interest.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"10 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-06-07DOI: 10.1140/epjds/s13688-024-00481-2
Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano
{"title":"Glitter or gold? Deriving structured insights from sustainability reports via large language models","authors":"Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano","doi":"10.1140/epjds/s13688-024-00481-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00481-2","url":null,"abstract":"<p>Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors’ increasing attention to Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a challenge in efficiently gathering and aligning the data into a unified framework to derive insights related to Corporate Social Responsibility (CSR). Thus, using Information Extraction (IE) methods becomes an intuitive choice for delivering insightful and actionable data to stakeholders. In this study, we employ Large Language Models (LLMs), In-Context Learning, and the Retrieval-Augmented Generation (RAG) paradigm to extract structured insights related to ESG aspects from companies’ sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights. These analyses revealed that ESG criteria cover a wide range of topics, exceeding 500, often beyond those considered in existing categorizations, and are addressed by companies through a variety of initiatives. Moreover, disclosure similarities emerged among companies from the same region or sector, validating ongoing hypotheses in the ESG literature. Lastly, by incorporating additional company attributes into our analyses, we investigated which factors impact the most on companies’ ESG ratings, showing that ESG disclosure affects the obtained ratings more than other financial or company data.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"64 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-06-05DOI: 10.1140/epjds/s13688-024-00480-3
Pau Muñoz, Alejandro Bellogín, Raúl Barba-Rojas, Fernando Díez
{"title":"Quantifying polarization in online political discourse","authors":"Pau Muñoz, Alejandro Bellogín, Raúl Barba-Rojas, Fernando Díez","doi":"10.1140/epjds/s13688-024-00480-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00480-3","url":null,"abstract":"<p>In an era of increasing political polarization, its analysis becomes crucial for the understanding of democratic dynamics. This paper presents a comprehensive research on measuring political polarization on X (Twitter) during election cycles in Spain, from 2011 to 2019. A wide comparative analysis is performed on algorithms used to identify and measure polarization or controversy on microblogging platforms. This analysis is specifically tailored towards publications made by official political party accounts during pre-campaign, campaign, election day, and the week post-election. Guided by the findings of this comparative evaluation, we propose a novel algorithm better suited to capture polarization in the context of political events, which is validated with real data. As a consequence, our research contributes a significant advancement in the field of political science, social network analysis, and overall computational social science, by providing a realistic method to capture polarization from online political discourse.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"69 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-05-17DOI: 10.1140/epjds/s13688-024-00476-z
Oleg Sobchuk, Mason Youngblood, Olivier Morin
{"title":"First-mover advantage in music","authors":"Oleg Sobchuk, Mason Youngblood, Olivier Morin","doi":"10.1140/epjds/s13688-024-00476-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00476-z","url":null,"abstract":"<p>Why do some songs and musicians become successful while others do not? We show that one of the reasons may be the “first-mover advantage”: artists that stand at the foundation of new music genres tend to be more successful than those who join these genres later on. To test this hypothesis, we have analyzed a massive dataset of over 920,000 songs, including 110 music genres: 10 chosen intentionally and preregistered, and 100 chosen randomly. For this, we collected the data from two music services: Spotify, which provides detailed information about songs’ success (the precise number of times each song was listened to), and Every Noise at Once, which provides detailed genre tags for musicians. 91 genres, out of 110, show the first-mover advantage—clearly suggesting that it is an important mechanism in music success and evolution.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-05-08DOI: 10.1140/epjds/s13688-024-00473-2
Amir Mehrjoo, Rubén Cuevas, Ángel Cuevas
{"title":"Online advertisement in a pink-colored market","authors":"Amir Mehrjoo, Rubén Cuevas, Ángel Cuevas","doi":"10.1140/epjds/s13688-024-00473-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00473-2","url":null,"abstract":"<p>It is surprising that women are often charged more for products and services marketed explicitly to them. This phenomenon, known as the pink tax, is a major issue that questions women’s buying power. Nevertheless, it is not just limited to physical products – even online advertising can be subject to this type of gender-price discrimination. That is where our research comes in. We have developed a new methodology to measure what we call the digital marketing pink tax – the additional expense of delivering advertisements to female audiences. Analyzing data from Facebook advertising platforms across 187 countries and 40 territories shows this issue is systematic. Particularly, the digital marketing pink tax is prevalent in 79% of audiences across the world and 98% of audiences in highly developed countries. Therefore, advertisers incur a median cost of 30% more to display advertisements to women than men. In contrast, advertisers have to pay less digital marketing pink tax in less-developed countries (5%). Our research indicates that countries in the Middle East and Africa with a low Human Development Index (<i>HDI</i>) do not experience this phenomenon. Our comprehensive investigation of 24 industries reveals that advertisers must pay up to 64% of the digital marketing pink tax to target women in some industries. Our findings also suggest a connection between the digital marketing pink tax and the consumer pink tax – the extra charge placed on products marketed to women. Overall, our research sheds light on an important issue affecting women worldwide. Raising awareness of the digital marketing pink tax and advocating for better regulation.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"59 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2024-05-06DOI: 10.1140/epjds/s13688-024-00475-0
Peter Mehler, Eva Iris Otto, Anna Sapienza
{"title":"Who makes open source code? The hybridisation of commercial and open source practices","authors":"Peter Mehler, Eva Iris Otto, Anna Sapienza","doi":"10.1140/epjds/s13688-024-00475-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00475-0","url":null,"abstract":"<p>While Free and Open Source (F/OSS) coding has traditionally been described as a separate commons linked to values of openness and sharing, recent research suggests an increasing integration of private corporations into F/OSS practices, blurring the boundaries between F/OSS and commodified coding. However, there is a dearth of empirical, and especially quantitative studies exploring this phenomenon. To address this gap, we model the power dynamics and infrastructural aspects of software production within GitHub, a central hub for F/OSS development, using a large-scale, directed network. Using various network statistics, we detect the ecosystem’s most impactful actors and find a nuanced picture of the influence of individuals, open source organizations, and private corporations in F/OSS practices. We find that the majority of public repositories on GitHub depend on a small core of specialized repositories and users. In accordance with expectations, individuals and open source organizations are more prevalent in this core of elite GitHub users, however, we also find a significant amount of private organizations with an indirect, yet consistent influence within GitHub. In addition, we find that directly influential individuals tend to facilitate sponsorship methods more often than indirectly or non-influential individuals. Our research highlights a hybridization of F/OSS and sheds light on the complex interplay between influence, power, and code production in the multi-language dependency ecosystem of GitHub.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"61 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}