Journal of data science : JDS最新文献

筛选
英文 中文
Incorporating Interventions to an Extended SEIRD Model with Vaccination: Application to COVID-19 in Qatar 将干预措施纳入扩展SEIRD模型与疫苗接种:在卡塔尔COVID-19中的应用
Journal of data science : JDS Pub Date : 2022-04-23 DOI: 10.6339/23-JDS1105
Elizabeth B Amona, R. Ghanam, E. Boone, Indranil Sahoo, L. Abu-Raddad
{"title":"Incorporating Interventions to an Extended SEIRD Model with Vaccination: Application to COVID-19 in Qatar","authors":"Elizabeth B Amona, R. Ghanam, E. Boone, Indranil Sahoo, L. Abu-Raddad","doi":"10.6339/23-JDS1105","DOIUrl":"https://doi.org/10.6339/23-JDS1105","url":null,"abstract":"The COVID-19 outbreak of 2020 has required many governments to develop and adopt mathematical-statistical models of the pandemic for policy and planning purposes. To this end, this work provides a tutorial on building a compartmental model using Susceptible, Exposed, Infected, Recovered, Deaths and Vaccinated (SEIRDV) status through time. The proposed model uses interventions to quantify the impact of various government attempts made to slow the spread of the virus. Furthermore, a vaccination parameter is also incorporated in the model, which is inactive until the time the vaccine is deployed. A Bayesian framework is utilized to perform both parameter estimation and prediction. Predictions are made to determine when the peak Active Infections occur. We provide inferential frameworks for assessing the effects of government interventions on the dynamic progression of the pandemic, including the impact of vaccination. The proposed model also allows for quantification of number of excess deaths averted over the study period due to vaccination.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43542129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Causal Discovery for Observational Sciences Using Supervised Machine Learning 使用监督机器学习的观察科学因果发现
Journal of data science : JDS Pub Date : 2022-02-25 DOI: 10.6339/23-jds1088
A. H. Petersen, J. Ramsey, C. Ekstrøm, P. Spirtes
{"title":"Causal Discovery for Observational Sciences Using Supervised Machine Learning","authors":"A. H. Petersen, J. Ramsey, C. Ekstrøm, P. Spirtes","doi":"10.6339/23-jds1088","DOIUrl":"https://doi.org/10.6339/23-jds1088","url":null,"abstract":"Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45032104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes. 使用最近邻高斯过程的空间Probit线性混合模型的可伸缩预测。
Journal of data science : JDS Pub Date : 2022-01-01 Epub Date: 2022-11-03 DOI: 10.6339/22-jds1073
Arkajyoti Saha, Abhirup Datta, Sudipto Banerjee
{"title":"Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes.","authors":"Arkajyoti Saha, Abhirup Datta, Sudipto Banerjee","doi":"10.6339/22-jds1073","DOIUrl":"10.6339/22-jds1073","url":null,"abstract":"<p><p>Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"20 4","pages":"533-544"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10544813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41167232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Classification of Plasmodium vivax Malaria Recurrence: An Application of Classifying Unknown Cause of Failure in Competing Risks. 间日疟原虫疟疾复发的动态分类:未知失败原因分类在竞争风险中的应用。
Journal of data science : JDS Pub Date : 2022-01-01 Epub Date: 2021-12-09 DOI: 10.6339/21-jds1026
Yutong Liu, Feng-Chang Lin, Jessica T Lin, Quefeng Li
{"title":"Dynamic Classification of <i>Plasmodium vivax</i> Malaria Recurrence: An Application of Classifying Unknown Cause of Failure in Competing Risks.","authors":"Yutong Liu,&nbsp;Feng-Chang Lin,&nbsp;Jessica T Lin,&nbsp;Quefeng Li","doi":"10.6339/21-jds1026","DOIUrl":"https://doi.org/10.6339/21-jds1026","url":null,"abstract":"<p><p>A standard competing risks set-up requires both time to event and cause of failure to be fully observable for all subjects. However, in application, the cause of failure may not always be observable, thus impeding the risk assessment. In some extreme cases, none of the causes of failure is observable. In the case of a recurrent episode of <i>Plasmodium vivax</i> malaria following treatment, the patient may have suffered a relapse from a previous infection or acquired a new infection from a mosquito bite. In this case, the time to relapse cannot be modeled when a competing risk, a new infection, is present. The efficacy of a treatment for preventing relapse from a previous infection may be underestimated when the true cause of infection cannot be classified. In this paper, we developed a novel method for classifying the latent cause of failure under a competing risks set-up, which uses not only time to event information but also transition likelihoods between covariates at the baseline and at the time of event occurrence. Our classifier shows superior performance under various scenarios in simulation experiments. The method was applied to <i>Plasmodium vivax</i> infection data to classify recurrent infections of malaria.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":"51-78"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9347664/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40585832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Python Package open-crypto: A Cryptocurrency Data Collector Python包open-crypto:一个加密货币数据收集器
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1059
Steffen Günther, C. Fieberg, Thorsten Poddig
{"title":"The Python Package open-crypto: A Cryptocurrency Data Collector","authors":"Steffen Günther, C. Fieberg, Thorsten Poddig","doi":"10.6339/22-jds1059","DOIUrl":"https://doi.org/10.6339/22-jds1059","url":null,"abstract":"This paper introduces the package open-crypto for free-of-charge and systematic cryptocurrency data collecting. The package supports several methods to request (1) static data, (2) real-time data and (3) historical data. It allows to retrieve data from over 100 of the most popular and liquid exchanges world-wide. New exchanges can easily be added with the help of provided templates or updated with build-in functions from the project repository. The package is available on GitHub and the Python package index (PyPi). The data is stored in a relational SQL database and therefore accessible from many different programming languages. We provide a hands-on and illustrations for each data type, explanations on the received data and also demonstrate the usability from R and Matlab. Academic research heavily relies on costly or confidential data, however, open data projects are becoming increasingly important. This project is mainly motivated to contribute to openly accessible software and free data in the cryptocurrency markets to improve transparency and reproducibility in research and any other disciplines.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiresolution Broad Area Search: Monitoring Spatial Characteristics of Gapless Remote Sensing Data 多分辨率广域搜索:监测无间隙遥感数据的空间特征
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1072
Laura J. Wendelberger, J. Gray, Alyson G. Wilson, R. Houborg, B. Reich
{"title":"Multiresolution Broad Area Search: Monitoring Spatial Characteristics of Gapless Remote Sensing Data","authors":"Laura J. Wendelberger, J. Gray, Alyson G. Wilson, R. Houborg, B. Reich","doi":"10.6339/22-jds1072","DOIUrl":"https://doi.org/10.6339/22-jds1072","url":null,"abstract":"Global earth monitoring aims to identify and characterize land cover change like construction as it occurs. Remote sensing makes it possible to collect large amounts of data in near real-time over vast geographic areas and is becoming available in increasingly fine temporal and spatial resolution. Many methods have been developed for data from a single pixel, but monitoring pixel-wise spectral measurements over time neglects spatial relationships, which become more important as change manifests in a greater number of pixels in higher resolution imagery compared to moderate resolution. Building on our previous robust online Bayesian monitoring (roboBayes) algorithm, we propose monitoring multiresolution signals based on a wavelet decomposition to capture spatial change coherence on several scales to detect change sites. Monitoring only a subset of relevant signals reduces the computational burden. The decomposition relies on gapless data; we use 3 m Planet Fusion Monitoring data. Simulations demonstrate the superiority of the spatial signals in multiresolution roboBayes (MR roboBayes) for detecting subtle changes compared to pixel-wise roboBayes. We use MR roboBayes to detect construction changes in two regions with distinct land cover and seasonal characteristics: Jacksonville, FL (USA) and Dubai (UAE). It achieves site detection with less than two thirds of the monitoring processes required for pixel-wise roboBayes at the same resolution.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Subpopulation Treatment Effect Pattern Plot (STEPP) Methods with R and Stata 亚种群处理效应模式图(STEPP)方法
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1060
S. Venturini, M. Bonetti, A. Lazar, B. Cole, Xin Victoria Wang, R. Gelber, Wai-Ki Yip
{"title":"Subpopulation Treatment Effect Pattern Plot (STEPP) Methods with R and Stata","authors":"S. Venturini, M. Bonetti, A. Lazar, B. Cole, Xin Victoria Wang, R. Gelber, Wai-Ki Yip","doi":"10.6339/22-jds1060","DOIUrl":"https://doi.org/10.6339/22-jds1060","url":null,"abstract":"We introduce the stepp packages for R and Stata that implement the subpopulation treatment effect pattern plot (STEPP) method. STEPP is a nonparametric graphical tool aimed at examining possible heterogeneous treatment effects in subpopulations defined on a continuous covariate or composite score. More pecifically, STEPP considers overlapping subpopulations defined with respect to a continuous covariate (or risk index) and it estimates a treatment effect for each subpopulation. It also produces confidence regions and tests for treatment effect heterogeneity among the subpopulations. The original method has been extended in different directions such as different survival contexts, outcome types, or more efficient procedures for identifying the overlapping subpopulations. In this paper, we also introduce a novel method to determine the number of subjects within the subpopulations by minimizing the variability of the sizes of the subpopulations generated by a specific parameter combination. We illustrate the packages using both synthetic data and publicly available data sets. The most intensive computations in R are implemented in Fortran, while the Stata version exploits the powerful Mata language.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of COVID-19 on Subjective Well-Being: Evidence from Twitter Data COVID-19对主观幸福感的影响:来自Twitter数据的证据
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1066
Tiziana Carpi, Airo Hino, S. Iacus, G. Porro
{"title":"The Impact of COVID-19 on Subjective Well-Being: Evidence from Twitter Data","authors":"Tiziana Carpi, Airo Hino, S. Iacus, G. Porro","doi":"10.6339/22-jds1066","DOIUrl":"https://doi.org/10.6339/22-jds1066","url":null,"abstract":"This study analyzes the impact of the COVID-19 pandemic on subjective well-being as measured through Twitter for the countries of Japan and Italy. In the first nine months of 2020, the Twitter indicators dropped by 11.7% for Italy and 8.3% for Japan compared to the last two months of 2019, and even more compared to their historical means. To understand what affected the Twitter mood so strongly, the study considers a pool of potential factors including: climate and air quality data, number of COVID-19 cases and deaths, Facebook COVID-19 and flu-like symptoms global survey data, coronavirus-related Google search data, policy intervention measures, human mobility data, macro economic variables, as well as health and stress proxy variables. This study proposes a framework to analyse and assess the relative impact of these external factors on the dynamic of Twitter mood and further implements a structural model to describe the underlying concept of subjective well-being. It turns out that prolonged mobility restrictions, flu and Covid-like symptoms, economic uncertainty and low levels of quality in social interactions have a negative impact on well-being.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What Kind of Music Do You Like? A Statistical Analysis of Music Genre Popularity Over Time 你喜欢什么样的音乐?音乐类型随时间流行的统计分析
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1040
Aimée M. Petitbon, D. B. Hitchcock
{"title":"What Kind of Music Do You Like? A Statistical Analysis of Music Genre Popularity Over Time","authors":"Aimée M. Petitbon, D. B. Hitchcock","doi":"10.6339/22-jds1040","DOIUrl":"https://doi.org/10.6339/22-jds1040","url":null,"abstract":"<jats:p />","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sampling-based Gaussian Mixture Regression for Big Data 基于抽样的大数据高斯混合回归
Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1057
Joochul Lee, E. Schifano, Haiying Wang
{"title":"Sampling-based Gaussian Mixture Regression for Big Data","authors":"Joochul Lee, E. Schifano, Haiying Wang","doi":"10.6339/22-jds1057","DOIUrl":"https://doi.org/10.6339/22-jds1057","url":null,"abstract":"This paper proposes a nonuniform subsampling method for finite mixtures of regression models to reduce large data computational tasks. A general estimator based on a subsample is investigated, and its asymptotic normality is established. We assign optimal subsampling probabilities to data points that minimize the asymptotic mean squared errors of the general estimator and linearly transformed estimators. Since the proposed probabilities depend on unknown parameters, an implementable algorithm is developed. We first approximate the optimal subsampling probabilities using a pilot sample. After that, we select a subsample using the approximated subsampling probabilities and compute estimates using the subsample. We evaluate the proposed method in a simulation study and present a real data example using appliance energy data.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信