Journal of data science : JDS最新文献_第4页

Incorporating Interventions to an Extended SEIRD Model with Vaccination: Application to COVID-19 in Qatar 将干预措施纳入扩展SEIRD模型与疫苗接种:在卡塔尔COVID-19中的应用

Journal of data science : JDS Pub Date : 2022-04-23 DOI: 10.6339/23-JDS1105

Elizabeth B Amona, R. Ghanam, E. Boone, Indranil Sahoo, L. Abu-Raddad

引用次数: 1

Causal Discovery for Observational Sciences Using Supervised Machine Learning 使用监督机器学习的观察科学因果发现

Journal of data science : JDS Pub Date : 2022-02-25 DOI: 10.6339/23-jds1088

A. H. Petersen, J. Ramsey, C. Ekstrøm, P. Spirtes

{"title":"Causal Discovery for Observational Sciences Using Supervised Machine Learning","authors":"A. H. Petersen, J. Ramsey, C. Ekstrøm, P. Spirtes","doi":"10.6339/23-jds1088","DOIUrl":"https://doi.org/10.6339/23-jds1088","url":null,"abstract":"Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45032104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes. 使用最近邻高斯过程的空间Probit线性混合模型的可伸缩预测。

Journal of data science : JDS Pub Date : 2022-01-01 Epub Date: 2022-11-03 DOI: 10.6339/22-jds1073

Arkajyoti Saha, Abhirup Datta, Sudipto Banerjee

{"title":"Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes.","authors":"Arkajyoti Saha, Abhirup Datta, Sudipto Banerjee","doi":"10.6339/22-jds1073","DOIUrl":"10.6339/22-jds1073","url":null,"abstract":"Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"20 4","pages":"533-544"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10544813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41167232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Classification of Plasmodium vivax Malaria Recurrence: An Application of Classifying Unknown Cause of Failure in Competing Risks. 间日疟原虫疟疾复发的动态分类:未知失败原因分类在竞争风险中的应用。

Journal of data science : JDS Pub Date : 2022-01-01 Epub Date: 2021-12-09 DOI: 10.6339/21-jds1026

Yutong Liu, Feng-Chang Lin, Jessica T Lin, Quefeng Li

{"title":"Dynamic Classification of Plasmodium vivax Malaria Recurrence: An Application of Classifying Unknown Cause of Failure in Competing Risks.","authors":"Yutong Liu, Feng-Chang Lin, Jessica T Lin, Quefeng Li","doi":"10.6339/21-jds1026","DOIUrl":"https://doi.org/10.6339/21-jds1026","url":null,"abstract":"A standard competing risks set-up requires both time to event and cause of failure to be fully observable for all subjects. However, in application, the cause of failure may not always be observable, thus impeding the risk assessment. In some extreme cases, none of the causes of failure is observable. In the case of a recurrent episode of Plasmodium vivax malaria following treatment, the patient may have suffered a relapse from a previous infection or acquired a new infection from a mosquito bite. In this case, the time to relapse cannot be modeled when a competing risk, a new infection, is present. The efficacy of a treatment for preventing relapse from a previous infection may be underestimated when the true cause of infection cannot be classified. In this paper, we developed a novel method for classifying the latent cause of failure under a competing risks set-up, which uses not only time to event information but also transition likelihoods between covariates at the baseline and at the time of event occurrence. Our classifier shows superior performance under various scenarios in simulation experiments. The method was applied to Plasmodium vivax infection data to classify recurrent infections of malaria.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":"51-78"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9347664/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40585832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Python Package open-crypto: A Cryptocurrency Data Collector Python包open-crypto:一个加密货币数据收集器

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1059

Steffen Günther, C. Fieberg, Thorsten Poddig

引用次数: 0

Multiresolution Broad Area Search: Monitoring Spatial Characteristics of Gapless Remote Sensing Data 多分辨率广域搜索:监测无间隙遥感数据的空间特征

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1072

Laura J. Wendelberger, J. Gray, Alyson G. Wilson, R. Houborg, B. Reich

{"title":"Multiresolution Broad Area Search: Monitoring Spatial Characteristics of Gapless Remote Sensing Data","authors":"Laura J. Wendelberger, J. Gray, Alyson G. Wilson, R. Houborg, B. Reich","doi":"10.6339/22-jds1072","DOIUrl":"https://doi.org/10.6339/22-jds1072","url":null,"abstract":"Global earth monitoring aims to identify and characterize land cover change like construction as it occurs. Remote sensing makes it possible to collect large amounts of data in near real-time over vast geographic areas and is becoming available in increasingly fine temporal and spatial resolution. Many methods have been developed for data from a single pixel, but monitoring pixel-wise spectral measurements over time neglects spatial relationships, which become more important as change manifests in a greater number of pixels in higher resolution imagery compared to moderate resolution. Building on our previous robust online Bayesian monitoring (roboBayes) algorithm, we propose monitoring multiresolution signals based on a wavelet decomposition to capture spatial change coherence on several scales to detect change sites. Monitoring only a subset of relevant signals reduces the computational burden. The decomposition relies on gapless data; we use 3 m Planet Fusion Monitoring data. Simulations demonstrate the superiority of the spatial signals in multiresolution roboBayes (MR roboBayes) for detecting subtle changes compared to pixel-wise roboBayes. We use MR roboBayes to detect construction changes in two regions with distinct land cover and seasonal characteristics: Jacksonville, FL (USA) and Dubai (UAE). It achieves site detection with less than two thirds of the monitoring processes required for pixel-wise roboBayes at the same resolution.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Subpopulation Treatment Effect Pattern Plot (STEPP) Methods with R and Stata 亚种群处理效应模式图(STEPP)方法

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1060

S. Venturini, M. Bonetti, A. Lazar, B. Cole, Xin Victoria Wang, R. Gelber, Wai-Ki Yip

{"title":"Subpopulation Treatment Effect Pattern Plot (STEPP) Methods with R and Stata","authors":"S. Venturini, M. Bonetti, A. Lazar, B. Cole, Xin Victoria Wang, R. Gelber, Wai-Ki Yip","doi":"10.6339/22-jds1060","DOIUrl":"https://doi.org/10.6339/22-jds1060","url":null,"abstract":"We introduce the stepp packages for R and Stata that implement the subpopulation treatment effect pattern plot (STEPP) method. STEPP is a nonparametric graphical tool aimed at examining possible heterogeneous treatment effects in subpopulations defined on a continuous covariate or composite score. More pecifically, STEPP considers overlapping subpopulations defined with respect to a continuous covariate (or risk index) and it estimates a treatment effect for each subpopulation. It also produces confidence regions and tests for treatment effect heterogeneity among the subpopulations. The original method has been extended in different directions such as different survival contexts, outcome types, or more efficient procedures for identifying the overlapping subpopulations. In this paper, we also introduce a novel method to determine the number of subjects within the subpopulations by minimizing the variability of the sizes of the subpopulations generated by a specific parameter combination. We illustrate the packages using both synthetic data and publicly available data sets. The most intensive computations in R are implemented in Fortran, while the Stata version exploits the powerful Mata language.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of COVID-19 on Subjective Well-Being: Evidence from Twitter Data COVID-19对主观幸福感的影响:来自Twitter数据的证据

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1066

Tiziana Carpi, Airo Hino, S. Iacus, G. Porro

引用次数: 2

What Kind of Music Do You Like? A Statistical Analysis of Music Genre Popularity Over Time 你喜欢什么样的音乐?音乐类型随时间流行的统计分析

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1040

Aimée M. Petitbon, D. B. Hitchcock

引用次数: 2

Sampling-based Gaussian Mixture Regression for Big Data 基于抽样的大数据高斯混合回归

Journal of data science : JDS Pub Date : 2022-01-01 DOI: 10.6339/22-jds1057

Joochul Lee, E. Schifano, Haiying Wang

引用次数: 0