EPJ Data SciencePub Date : 2023-01-01Epub Date: 2023-06-06DOI: 10.1140/epjds/s13688-023-00395-5
Nicolò Gozzi, Niccolò Comini, Nicola Perra
{"title":"The adoption of non-pharmaceutical interventions and the role of digital infrastructure during the COVID-19 pandemic in Colombia, Ecuador, and El Salvador.","authors":"Nicolò Gozzi, Niccolò Comini, Nicola Perra","doi":"10.1140/epjds/s13688-023-00395-5","DOIUrl":"10.1140/epjds/s13688-023-00395-5","url":null,"abstract":"<p><p>Adherence to the non-pharmaceutical interventions (NPIs) put in place to mitigate the spreading of infectious diseases is a multifaceted problem. Several factors, including socio-demographic and socio-economic attributes, can influence the perceived susceptibility and risk which are known to affect behavior. Furthermore, the adoption of NPIs is dependent upon the barriers, real or perceived, associated with their implementation. Here, we study the determinants of NPIs adherence during the first wave of the COVID-19 Pandemic in Colombia, Ecuador, and El Salvador. Analyses are performed at the level of municipalities and include socio-economic, socio-demographic, and epidemiological indicators. Furthermore, by leveraging a unique dataset comprising tens of millions of internet Speedtest® measurements from Ookla®, we investigate the quality of the digital infrastructure as a possible barrier to adoption. We use mobility changes provided by Meta as a proxy of adherence to NPIs and find a significant correlation between mobility drops and digital infrastructure quality. The relationship remains significant after controlling for several factors. This finding suggests that municipalities with better internet connectivity were able to afford higher mobility reductions. We also find that mobility reductions were more pronounced in larger, denser, and wealthier municipalities.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-023-00395-5.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"18"},"PeriodicalIF":3.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9612333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2023-01-01DOI: 10.1140/epjds/s13688-023-00389-3
Johannes Wachs
{"title":"Digital traces of brain drain: developers during the Russian invasion of Ukraine.","authors":"Johannes Wachs","doi":"10.1140/epjds/s13688-023-00389-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00389-3","url":null,"abstract":"<p><p>The Russian invasion of Ukraine has caused large scale destruction, significant loss of life, and the displacement of millions of people. Besides those fleeing direct conflict in Ukraine, many individuals in Russia are also thought to have moved to third countries. In particular the exodus of skilled human capital, sometimes called brain drain, out of Russia may have a significant effect on the course of the war and the Russian economy in the long run. Yet quantifying brain drain, especially during crisis situations is generally difficult. This hinders our ability to understand its drivers and to anticipate its consequences. To address this gap, I draw on and extend a large scale dataset of the locations of highly active software developers collected in February 2021, one year before the invasion. Revisiting those developers that had been located in Russia in 2021, I confirm an ongoing exodus of developers from Russia in snapshots taken in June and November 2022. By November 11.1% of Russian developers list a new country, compared with 2.8% of developers from comparable countries in the region but not directly involved in the conflict. 13.2% of Russian developers have obscured their location (vs. 2.4% in the comparison set). Developers leaving Russia were significantly more active and central in the collaboration network than those who remain. This suggests that many of the most important developers have already left Russia. In some receiving countries the number of arrivals is significant: I estimate an increase in the number of local software developers of 42% in Armenia, 60% in Cyprus and 94% in Georgia.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-023-00389-3.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"14"},"PeriodicalIF":3.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9557423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2023-01-01DOI: 10.1140/epjds/s13688-023-00391-9
Xiao Fan Liu, Zhen-Zhen Wang, Xiao-Ke Xu, Ye Wu, Zhidan Zhao, Huarong Deng, Ping Wang, Naipeng Chao, Yi-Hui C Huang
{"title":"The shock, the coping, the resilience: smartphone application use reveals Covid-19 lockdown effects on human behaviors.","authors":"Xiao Fan Liu, Zhen-Zhen Wang, Xiao-Ke Xu, Ye Wu, Zhidan Zhao, Huarong Deng, Ping Wang, Naipeng Chao, Yi-Hui C Huang","doi":"10.1140/epjds/s13688-023-00391-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00391-9","url":null,"abstract":"<p><p>Human mobility restriction policies have been widely used to contain the coronavirus disease-19 (COVID-19). However, a critical question is how these policies affect individuals' behavioral and psychological well-being during and after confinement periods. Here, we analyze China's five most stringent city-level lockdowns in 2021, treating them as natural experiments that allow for examining behavioral changes in millions of people through smartphone application use. We made three fundamental observations. First, the use of physical and economic activity-related apps experienced a steep decline, yet apps that provide daily necessities maintained normal usage. Second, apps that fulfilled lower-level human needs, such as working, socializing, information seeking, and entertainment, saw an immediate and substantial increase in screen time. Those that satisfied higher-level needs, such as education, only attracted delayed attention. Third, human behaviors demonstrated resilience as most routines resumed after the lockdowns were lifted. Nonetheless, long-term lifestyle changes were observed, as significant numbers of people chose to continue working and learning online, becoming \"digital residents.\" This study also demonstrates the capability of smartphone screen time analytics in the study of human behaviors.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-023-00391-9.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"17"},"PeriodicalIF":3.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9947205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2023-01-01Epub Date: 2023-05-18DOI: 10.1140/epjds/s13688-023-00390-w
Yanni Yang, Alex Pentland, Esteban Moro
{"title":"Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics.","authors":"Yanni Yang, Alex Pentland, Esteban Moro","doi":"10.1140/epjds/s13688-023-00390-w","DOIUrl":"10.1140/epjds/s13688-023-00390-w","url":null,"abstract":"<p><p>Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million people to 1.1 million places in 11 metro areas in the U.S. to detect the latent mobility behaviors and lifestyles in the largest American cities. Despite the considerable complexity of mobility visitations, we found that lifestyles can be automatically decomposed into only 12 latent interpretable activity behaviors on how people combine shopping, eating, working, or using their free time. Rather than describing individuals with a single lifestyle, we find that city dwellers' behavior is a mixture of those behaviors. Those detected latent activity behaviors are equally present across cities and cannot be fully explained by main demographic features. Finally, we find those latent behaviors are associated with dynamics like experienced income segregation, transportation, or healthy behaviors in cities, even after controlling for demographic features. Our results signal the importance of complementing traditional census data with activity behaviors to understand urban dynamics.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-023-00390-w.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"15"},"PeriodicalIF":3.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10193357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9509481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2023-01-01DOI: 10.1140/epjds/s13688-023-00387-5
Teo Susnjak, Paula Maddigan
{"title":"Forecasting patient flows with pandemic induced concept drift using explainable machine learning.","authors":"Teo Susnjak, Paula Maddigan","doi":"10.1140/epjds/s13688-023-00387-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00387-5","url":null,"abstract":"<p><p>Accurately forecasting patient arrivals at Urgent Care Clinics (UCCs) and Emergency Departments (EDs) is important for effective resourcing and patient care. However, correctly estimating patient flows is not straightforward since it depends on many drivers. The predictability of patient arrivals has recently been further complicated by the COVID-19 pandemic conditions and the resulting lockdowns. This study investigates how a suite of novel quasi-real-time variables like Google search terms, pedestrian traffic, the prevailing incidence levels of influenza, as well as the COVID-19 Alert Level indicators can both generally improve the forecasting models of patient flows and effectively adapt the models to the unfolding disruptions of pandemic conditions. This research also uniquely contributes to the body of work in this domain by employing tools from the eXplainable AI field to investigate more deeply the internal mechanics of the models than has previously been done. The Voting ensemble-based method combining machine learning and statistical techniques was the most reliable in our experiments. Our study showed that the prevailing COVID-19 Alert Level feature together with Google search terms and pedestrian traffic were effective at producing generalisable forecasts. The implications of this study are that proxy variables can effectively augment standard autoregressive features to ensure accurate forecasting of patient flows. The experiments showed that the proposed features are potentially effective model inputs for preserving forecast accuracies in the event of future pandemic outbreaks.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"11"},"PeriodicalIF":3.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10119825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9448957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2023-01-01DOI: 10.1140/epjds/s13688-023-00383-9
Kin Wai Ng, Frederick Mubang, Lawrence O Hall, John Skvoretz, Adriana Iamnitchi
{"title":"Experimental evaluation of baselines for forecasting social media timeseries.","authors":"Kin Wai Ng, Frederick Mubang, Lawrence O Hall, John Skvoretz, Adriana Iamnitchi","doi":"10.1140/epjds/s13688-023-00383-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00383-9","url":null,"abstract":"<p><p>Forecasting social media activity can be of practical use in many scenarios, from understanding trends, such as which topics are likely to engage more users in the coming week, to identifying unusual behavior, such as coordinated information operations or currency manipulation efforts. To evaluate a new approach to forecasting, it is important to have baselines against which to assess performance gains. We experimentally evaluate the performance of four baselines for forecasting activity in several social media datasets that record discussions related to three different geo-political contexts synchronously taking place on two different platforms, Twitter and YouTube. Experiments are done over hourly time periods. Our evaluation identifies the baselines which are most accurate for particular metrics and thus provides guidance for future work in social media modeling.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"8"},"PeriodicalIF":3.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10042102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9594413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2023-01-01Epub Date: 2023-06-07DOI: 10.1140/epjds/s13688-023-00394-6
Esra Suel, Emily Muller, James E Bennett, Tony Blakely, Yvonne Doyle, John Lynch, Joreintje D Mackenbach, Ariane Middel, Anja Mizdrak, Ricky Nathvani, Michael Brauer, Majid Ezzati
{"title":"Do poverty and wealth look the same the world over? A comparative study of 12 cities from five high-income countries using street images.","authors":"Esra Suel, Emily Muller, James E Bennett, Tony Blakely, Yvonne Doyle, John Lynch, Joreintje D Mackenbach, Ariane Middel, Anja Mizdrak, Ricky Nathvani, Michael Brauer, Majid Ezzati","doi":"10.1140/epjds/s13688-023-00394-6","DOIUrl":"10.1140/epjds/s13688-023-00394-6","url":null,"abstract":"<p><p>Urbanization and inequalities are two of the major policy themes of our time, intersecting in large cities where social and economic inequalities are particularly pronounced. Large scale street-level images are a source of city-wide visual information and allow for comparative analyses of multiple cities. Computer vision methods based on deep learning applied to street images have been shown to successfully measure inequalities in socioeconomic and environmental features, yet existing work has been within specific geographies and have not looked at how visual environments compare across different cities and countries. In this study, we aim to apply existing methods to understand whether, and to what extent, poor and wealthy groups live in visually similar neighborhoods across cities and countries. We present novel insights on similarity of neighborhoods using street-level images and deep learning methods. We analyzed 7.2 million images from 12 cities in five high-income countries, home to more than 85 million people: Auckland (New Zealand), Sydney (Australia), Toronto and Vancouver (Canada), Atlanta, Boston, Chicago, Los Angeles, New York, San Francisco, and Washington D.C. (United States of America), and London (United Kingdom). Visual features associated with neighborhood disadvantage are more distinct and unique to each city than those associated with affluence. For example, from what is visible from street images, high density poor neighborhoods located near the city center (e.g., in London) are visually distinct from poor suburban neighborhoods characterized by lower density and lower accessibility (e.g., in Atlanta). This suggests that differences between two cities is also driven by historical factors, policies, and local geography. Our results also have implications for image-based measures of inequality in cities especially when trained on data from cities that are visually distinct from target cities. We showed that these are more prone to errors for disadvantaged areas especially when transferring across cities, suggesting more attention needs to be paid to improving methods for capturing heterogeneity in poor environment across cities around the world.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-023-00394-6.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"19"},"PeriodicalIF":3.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9982453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2021-01-01Epub Date: 2021-01-07DOI: 10.1140/epjds/s13688-020-00257-4
Demetris Avraam, Rebecca Wilson, Oliver Butters, Thomas Burton, Christos Nicolaides, Elinor Jones, Andy Boyd, Paul Burton
{"title":"Privacy preserving data visualizations.","authors":"Demetris Avraam, Rebecca Wilson, Oliver Butters, Thomas Burton, Christos Nicolaides, Elinor Jones, Andy Boyd, Paul Burton","doi":"10.1140/epjds/s13688-020-00257-4","DOIUrl":"10.1140/epjds/s13688-020-00257-4","url":null,"abstract":"<p><p>Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations - such as graphs and plots - may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known <i>k</i>-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each <i>k</i> nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"10 1","pages":"2"},"PeriodicalIF":3.6,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7790778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9501992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EPJ Data SciencePub Date : 2021-01-01DOI: 10.1140/epjds/s13688-021-00271-0
Thayer Alshaabi, David Rushing Dewhurst, Joshua R Minot, Michael V Arnold, Jane L Adams, Christopher M Danforth, Peter Sheridan Dodds
{"title":"The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009-2020.","authors":"Thayer Alshaabi, David Rushing Dewhurst, Joshua R Minot, Michael V Arnold, Jane L Adams, Christopher M Danforth, Peter Sheridan Dodds","doi":"10.1140/epjds/s13688-021-00271-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-021-00271-0","url":null,"abstract":"<p><p>Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, Arabic, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio': The balance of retweets to organic messages. We find that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content. By the end of 2019, the contagion ratios for half of the top 30 languages, including English and Spanish, had reached above 1-the naive contagion threshold. In 2019, the top 5 languages with the highest average daily ratios were, in order, Thai (7.3), Hindi, Tamil, Urdu, and Catalan, while the bottom 5 were Russian, Swedish, Esperanto, Cebuano, and Finnish (0.26). Further, we show that over time, the contagion ratios for most common languages are growing more strongly than those of rare languages.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"10 1","pages":"15"},"PeriodicalIF":3.6,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8010293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9202259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}