Journal of data science : JDS最新文献

筛选
英文 中文
Impact of Bias Correction of the Least Squares Estimation on Bootstrap Confidence Intervals for Bifurcating Autoregressive Models 最小二乘估计偏差校正对分岔自回归模型自举置信区间的影响
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1092
T. Elbayoumi, S. Mostafa
{"title":"Impact of Bias Correction of the Least Squares Estimation on Bootstrap Confidence Intervals for Bifurcating Autoregressive Models","authors":"T. Elbayoumi, S. Mostafa","doi":"10.6339/23-jds1092","DOIUrl":"https://doi.org/10.6339/23-jds1092","url":null,"abstract":"The least squares (LS) estimator of the autoregressive coefficient in the bifurcating autoregressive (BAR) model was recently shown to suffer from substantial bias, especially for small to moderate samples. This study investigates the impact of the bias in the LS estimator on the behavior of various types of bootstrap confidence intervals for the autoregressive coefficient and introduces methods for constructing bias-corrected bootstrap confidence intervals. We first describe several bootstrap confidence interval procedures for the autoregressive coefficient of the BAR model and present their bias-corrected versions. The behavior of uncorrected and corrected confidence interval procedures is studied empirically through extensive Monte Carlo simulations and two real cell lineage data applications. The empirical results show that the bias in the LS estimator can have a significant negative impact on the behavior of bootstrap confidence intervals and that bias correction can significantly improve the performance of bootstrap confidence intervals in terms of coverage, width, and symmetry.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models 计算指数族随机图模型的伪似然估计量
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1094
Christian S. Schmid, David R. Hunter
{"title":"Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models","authors":"Christian S. Schmid, David R. Hunter","doi":"10.6339/23-jds1094","DOIUrl":"https://doi.org/10.6339/23-jds1094","url":null,"abstract":"The reputation of the maximum pseudolikelihood estimator (MPLE) for Exponential Random Graph Models (ERGM) has undergone a drastic change over the past 30 years. While first receiving broad support, mainly due to its computational feasibility and the lack of alternatives, general opinions started to change with the introduction of approximate maximum likelihood estimator (MLE) methods that became practicable due to increasing computing power and the introduction of MCMC methods. Previous comparison studies appear to yield contradicting results regarding the preference of these two point estimators; however, there is consensus that the prevailing method to obtain an MPLE’s standard error by the inverse Hessian matrix generally underestimates standard errors. We propose replacing the inverse Hessian matrix by an approximation of the Godambe matrix that results in confidence intervals with appropriate coverage rates and that, in addition, enables examining for model degeneracy. Our results also provide empirical evidence for the asymptotic normality of the MPLE under certain conditions.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Covid-19 Vaccine Efficacy: Accuracy Assessment, Comparison, and Caveats Covid-19疫苗有效性:准确性评估、比较和注意事项
Journal of data science : JDS Pub Date : 2023-01-01 DOI: 10.6339/23-jds1089
Wenjiang J. Fu, Jieni Li, P. Scheet
{"title":"Covid-19 Vaccine Efficacy: Accuracy Assessment, Comparison, and Caveats","authors":"Wenjiang J. Fu, Jieni Li, P. Scheet","doi":"10.6339/23-jds1089","DOIUrl":"https://doi.org/10.6339/23-jds1089","url":null,"abstract":"Vaccine efficacy is a key index to evaluate vaccines in initial clinical trials during the development of vaccines. In particular, it plays a crucial role in authorizing Covid-19 vaccines. It has been reported that Covid-19 vaccine efficacy varies with a number of factors, including demographics of population, time after vaccine administration, and virus strains. By examining clinical trial data of three Covid-19 vaccine studies, we find that current approach to evaluating vaccines with an overall efficacy does not provide desired accuracy. It requires no time frame during which a candidate vaccine is evaluated, and is subject to misuse, resulting in potential misleading information and interpretation. In particular, we illustrate with clinical trial data that the variability of vaccine efficacy is underestimated. We demonstrate that a new method may help to address these caveats. It leads to accurate estimation of the variation of efficacy, provides useful information to define a reasonable time frame to evaluate vaccines, and avoids misuse of vaccine efficacy and misleading information.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71320716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Second Competition on Spatial Statistics for Large Datasets 第二届大型数据集空间统计竞赛
Journal of data science : JDS Pub Date : 2022-11-06 DOI: 10.6339/22-jds1076
Sameh Abdulah, Faten S. Alamri, Pratik Nag, Ying Sun, H. Ltaief, D. Keyes, M. Genton
{"title":"The Second Competition on Spatial Statistics for Large Datasets","authors":"Sameh Abdulah, Faten S. Alamri, Pratik Nag, Ying Sun, H. Ltaief, D. Keyes, M. Genton","doi":"10.6339/22-jds1076","DOIUrl":"https://doi.org/10.6339/22-jds1076","url":null,"abstract":"In the last few decades, the size of spatial and spatio-temporal datasets in many research areas has rapidly increased with the development of data collection technologies. As a result, classical statistical methods in spatial statistics are facing computational challenges. For example, the kriging predictor in geostatistics becomes prohibitive on traditional hardware architectures for large datasets as it requires high computing power and memory footprint when dealing with large dense matrix operations. Over the years, various approximation methods have been proposed to address such computational issues, however, the community lacks a holistic process to assess their approximation efficiency. To provide a fair assessment, in 2021, we organized the first competition on spatial statistics for large datasets, generated by our ExaGeoStat software, and asked participants to report the results of estimation and prediction. Thanks to its widely acknowledged success and at the request of many participants, we organized the second competition in 2022 focusing on predictions for more complex spatial and spatio-temporal processes, including univariate nonstationary spatial processes, univariate stationary space-time processes, and bivariate stationary spatial processes. In this paper, we describe in detail the data generation procedure and make the valuable datasets publicly available for a wider adoption. Then, we review the submitted methods from fourteen teams worldwide, analyze the competition outcomes, and assess the performance of each team.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42045278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Vecchia Approximations and Optimization for Multivariate Matérn Models 多元数学模型的Vecchia逼近与优化
Journal of data science : JDS Pub Date : 2022-10-17 DOI: 10.6339/22-jds1074
Youssef A. Fahmy, J. Guinness
{"title":"Vecchia Approximations and Optimization for Multivariate Matérn Models","authors":"Youssef A. Fahmy, J. Guinness","doi":"10.6339/22-jds1074","DOIUrl":"https://doi.org/10.6339/22-jds1074","url":null,"abstract":"We describe our implementation of the multivariate Matérn model for multivariate spatial datasets, using Vecchia’s approximation and a Fisher scoring optimization algorithm. We consider various pararameterizations for the multivariate Matérn that have been proposed in the literature for ensuring model validity, as well as an unconstrained model. A strength of our study is that the code is tested on many real-world multivariate spatial datasets. We use it to study the effect of ordering and conditioning in Vecchia’s approximation and the restrictions imposed by the various parameterizations. We also consider a model in which co-located nuggets are correlated across components and find that forcing this cross-component nugget correlation to be zero can have a serious impact on the other model parameters, so we suggest allowing cross-component correlation in co-located nugget terms.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44042544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach 黎曼流形上大数据集的地质统计学:无矩阵方法
Journal of data science : JDS Pub Date : 2022-08-26 DOI: 10.6339/22-jds1075
M. Pereira, N. Desassis, D. Allard
{"title":"Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach","authors":"M. Pereira, N. Desassis, D. Allard","doi":"10.6339/22-jds1075","DOIUrl":"https://doi.org/10.6339/22-jds1075","url":null,"abstract":"Large or very large spatial (and spatio-temporal) datasets have become common place in many environmental and climate studies. These data are often collected in non-Euclidean spaces (such as the planet Earth) and they often present nonstationary anisotropies. This paper proposes a generic approach to model Gaussian Random Fields (GRFs) on compact Riemannian manifolds that bridges the gap between existing works on nonstationary GRFs and random fields on manifolds. This approach can be applied to any smooth compact manifolds, and in particular to any compact surface. By defining a Riemannian metric that accounts for the preferential directions of correlation, our approach yields an interpretation of the nonstationary geometric anisotropies as resulting from local deformations of the domain. We provide scalable algorithms for the estimation of the parameters and for optimal prediction by kriging and simulation able to tackle very large grids. Stationary and nonstationary illustrations are provided.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46734367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
EVIboost for the Estimation of Extreme Value Index Under Heterogeneous Extremes 非均匀极值下极值指数估计的EVIboost
Journal of data science : JDS Pub Date : 2022-05-28 DOI: 10.6339/22-jds1067
Jiaxi Wang, Yanxi Hou, Xingchi Li, Tiandong Wang
{"title":"EVIboost for the Estimation of Extreme Value Index Under Heterogeneous Extremes","authors":"Jiaxi Wang, Yanxi Hou, Xingchi Li, Tiandong Wang","doi":"10.6339/22-jds1067","DOIUrl":"https://doi.org/10.6339/22-jds1067","url":null,"abstract":"Modeling heterogeneity on heavy-tailed distributions under a regression framework is challenging, yet classical statistical methodologies usually place conditions on the distribution models to facilitate the learning procedure. However, these conditions will likely overlook the complex dependence structure between the heaviness of tails and the covariates. Moreover, data sparsity on tail regions makes the inference method less stable, leading to biased estimates for extreme-related quantities. This paper proposes a gradient boosting algorithm to estimate a functional extreme value index with heterogeneous extremes. Our proposed algorithm is a data-driven procedure capturing complex and dynamic structures in tail distributions. We also conduct extensive simulation studies to show the prediction accuracy of the proposed algorithm. In addition, we apply our method to a real-world data set to illustrate the state-dependent and time-varying properties of heavy-tail phenomena in the financial industry.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43556826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation 鲁棒可扩展的非参数多类概率估计的线性算法
Journal of data science : JDS Pub Date : 2022-05-25 DOI: 10.6339/22-jds1069
Liyun Zeng, Hao Helen Zhang
{"title":"Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation","authors":"Liyun Zeng, Hao Helen Zhang","doi":"10.6339/22-jds1069","DOIUrl":"https://doi.org/10.6339/22-jds1069","url":null,"abstract":"Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44600781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating Interventions to an Extended SEIRD Model with Vaccination: Application to COVID-19 in Qatar 将干预措施纳入扩展SEIRD模型与疫苗接种:在卡塔尔COVID-19中的应用
Journal of data science : JDS Pub Date : 2022-04-23 DOI: 10.6339/23-JDS1105
Elizabeth B Amona, R. Ghanam, E. Boone, Indranil Sahoo, L. Abu-Raddad
{"title":"Incorporating Interventions to an Extended SEIRD Model with Vaccination: Application to COVID-19 in Qatar","authors":"Elizabeth B Amona, R. Ghanam, E. Boone, Indranil Sahoo, L. Abu-Raddad","doi":"10.6339/23-JDS1105","DOIUrl":"https://doi.org/10.6339/23-JDS1105","url":null,"abstract":"The COVID-19 outbreak of 2020 has required many governments to develop and adopt mathematical-statistical models of the pandemic for policy and planning purposes. To this end, this work provides a tutorial on building a compartmental model using Susceptible, Exposed, Infected, Recovered, Deaths and Vaccinated (SEIRDV) status through time. The proposed model uses interventions to quantify the impact of various government attempts made to slow the spread of the virus. Furthermore, a vaccination parameter is also incorporated in the model, which is inactive until the time the vaccine is deployed. A Bayesian framework is utilized to perform both parameter estimation and prediction. Predictions are made to determine when the peak Active Infections occur. We provide inferential frameworks for assessing the effects of government interventions on the dynamic progression of the pandemic, including the impact of vaccination. The proposed model also allows for quantification of number of excess deaths averted over the study period due to vaccination.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43542129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Causal Discovery for Observational Sciences Using Supervised Machine Learning 使用监督机器学习的观察科学因果发现
Journal of data science : JDS Pub Date : 2022-02-25 DOI: 10.6339/23-jds1088
A. H. Petersen, J. Ramsey, C. Ekstrøm, P. Spirtes
{"title":"Causal Discovery for Observational Sciences Using Supervised Machine Learning","authors":"A. H. Petersen, J. Ramsey, C. Ekstrøm, P. Spirtes","doi":"10.6339/23-jds1088","DOIUrl":"https://doi.org/10.6339/23-jds1088","url":null,"abstract":"Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45032104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信