The New England Journal of Statistics in Data Science最新文献

筛选
英文 中文
Modeling Multivariate Spatial Dependencies Using Graphical Models. 使用图形模型建模多变量空间相关性。
The New England Journal of Statistics in Data Science Pub Date : 2023-09-01 Epub Date: 2023-09-06 DOI: 10.51387/23-nejsds47
Debangan Dey, Abhirup Datta, Sudipto Banerjee
{"title":"Modeling Multivariate Spatial Dependencies Using Graphical Models.","authors":"Debangan Dey,&nbsp;Abhirup Datta,&nbsp;Sudipto Banerjee","doi":"10.51387/23-nejsds47","DOIUrl":"https://doi.org/10.51387/23-nejsds47","url":null,"abstract":"<p><p>Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.</p>","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1 2","pages":"283-295"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10563032/pdf/nihms-1934371.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41226881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of model space priors on statistical inference with model uncertainty. 模型空间先验对模型不确定性统计推断的影响。
The New England Journal of Statistics in Data Science Pub Date : 2023-09-01 Epub Date: 2022-11-16
Anupreet Porwal, Adrian E Raftery
{"title":"Effect of model space priors on statistical inference with model uncertainty.","authors":"Anupreet Porwal, Adrian E Raftery","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.</p>","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1 2","pages":"149-158"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482600/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142485094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models 双广义线性Tweedie空间过程模型中的贝叶斯变量选择
The New England Journal of Statistics in Data Science Pub Date : 2023-06-19 DOI: 10.51387/23-NEJSDS37
Aritra Halder, Shariq Mohammed, D. Dey
{"title":"Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models","authors":"Aritra Halder, Shariq Mohammed, D. Dey","doi":"10.51387/23-NEJSDS37","DOIUrl":"https://doi.org/10.51387/23-NEJSDS37","url":null,"abstract":"Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77699161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses 定量和定性反应实验的贝叶斯d -最优设计
The New England Journal of Statistics in Data Science Pub Date : 2023-04-18 DOI: 10.51387/23-nejsds30
Lulu Kang, Xinwei Deng, R. Jin
{"title":"Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses","authors":"Lulu Kang, Xinwei Deng, R. Jin","doi":"10.51387/23-nejsds30","DOIUrl":"https://doi.org/10.51387/23-nejsds30","url":null,"abstract":"Systems with both quantitative and qualitative responses are widely encountered in many applications. Design of experiment methods are needed when experiments are conducted to study such systems. Classic experimental design methods are unsuitable here because they often focus on one type of response. In this paper, we develop a Bayesian D-optimal design method for experiments with one continuous and one binary response. Both noninformative and conjugate informative prior distributions on the unknown parameters are considered. The proposed design criterion has meaningful interpretations regarding the D-optimality for the models for both types of responses. An efficient point-exchange search algorithm is developed to construct the local D-optimal designs for given parameter values. Global D-optimal designs are obtained by accumulating the frequencies of the design points in local D-optimal designs, where the parameters are sampled from the prior distributions. The performances of the proposed methods are evaluated through two examples.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73674038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian Interim Analysis in Basket Trials 篮子试验中的贝叶斯中期分析
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds48
Cheng Huang, Chenghao Chu, Yimeng Lu, Bingming Yi, Ming-Hui Chen
{"title":"Bayesian Interim Analysis in Basket Trials","authors":"Cheng Huang, Chenghao Chu, Yimeng Lu, Bingming Yi, Ming-Hui Chen","doi":"10.51387/23-nejsds48","DOIUrl":"https://doi.org/10.51387/23-nejsds48","url":null,"abstract":"Basket trials have captured much attention in oncology research in recent years, as advances in health technology have opened up the possibility of classification of patients at the genomic level. Bayesian methods are particularly prevalent in basket trials as the hierarchical structure is adapted to basket trials to allow for information borrowing. In this article, we extend the Bayesian methods to basket trials with treatment and control arms for continuous endpoints, which are often the cases in clinical trials for rare diseases. To account for the imbalance in the covariates which are potentially strong predictors but not stratified in a randomized trial, our models make adjustments for these covariates, and allow different coefficients across baskets. In addition, comparisons are drawn between two-stage design and one-stage design for the four Bayesian methods. Extensive simulation studies are conducted to examine the empirical performance of all models under consideration. A real data analysis is carried out to further demonstrate the usefulness of the Bayesian methods.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89692242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data 基于大量缺失数据的交通流异常检测
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds20
Qing He, Charles W. Harrison, Hsin-Hsiung Huang
{"title":"Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data","authors":"Qing He, Charles W. Harrison, Hsin-Hsiung Huang","doi":"10.51387/23-nejsds20","DOIUrl":"https://doi.org/10.51387/23-nejsds20","url":null,"abstract":"Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135420116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Editorial. Modern Bayesian Methods with Applications in Data Science 社论。现代贝叶斯方法及其在数据科学中的应用
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds12edi
Dipak K. Dey, Ming-Hui Chen, Min-ge Xie, HaiYing Wang, Jing Wu
{"title":"Editorial. Modern Bayesian Methods with Applications in Data Science","authors":"Dipak K. Dey, Ming-Hui Chen, Min-ge Xie, HaiYing Wang, Jing Wu","doi":"10.51387/23-nejsds12edi","DOIUrl":"https://doi.org/10.51387/23-nejsds12edi","url":null,"abstract":"Publisher: New England Statistical Society, Journal: The New England Journal of Statistics in Data Science, Title: Editorial. Modern Bayesian Methods with Applications in Data Science, Authors: Dipak K. Dey, Ming-Hui Chen, Min-ge Xie","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135495816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Simultaneous Partial Envelope Model with Application to an Imaging Genetics Analysis 贝叶斯同时部分包络模型及其在成像遗传学分析中的应用
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds23
Yanbo Shen, Yeonhee Park, Saptarshi Chakraborty, Chunming Zhang
{"title":"Bayesian Simultaneous Partial Envelope Model with Application to an Imaging Genetics Analysis","authors":"Yanbo Shen, Yeonhee Park, Saptarshi Chakraborty, Chunming Zhang","doi":"10.51387/23-nejsds23","DOIUrl":"https://doi.org/10.51387/23-nejsds23","url":null,"abstract":"As a prominent dimension reduction method for multivariate linear regression, the envelope model has received increased attention over the past decade due to its modeling flexibility and success in enhancing estimation and prediction efficiencies. Several enveloping approaches have been proposed in the literature; among these, the partial response envelope model [57] that focuses on only enveloping the coefficients for predictors of interest, and the simultaneous envelope model [14] that combines the predictor and the response envelope models within a unified modeling framework, are noteworthy. In this article we incorporate these two approaches within a Bayesian framework, and propose a novel Bayesian simultaneous partial envelope model that generalizes and addresses some limitations of the two approaches. Our method offers the flexibility of incorporating prior information if available, and aids coherent quantification of all modeling uncertainty through the posterior distribution of model parameters. A block Metropolis-within-Gibbs algorithm for Markov chain Monte Carlo (MCMC) sampling from the posterior is developed. The utility of our model is corroborated by theoretical results, comprehensive simulations, and a real imaging genetics data application for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74033047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Highest Posterior Model Computation and Variable Selection via Simulated Annealing 基于模拟退火的最高后验模型计算和变量选择
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds40
A. Maity, S. Basu
{"title":"Highest Posterior Model Computation and Variable Selection via Simulated Annealing","authors":"A. Maity, S. Basu","doi":"10.51387/23-nejsds40","DOIUrl":"https://doi.org/10.51387/23-nejsds40","url":null,"abstract":"Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. A formal way to perform this selection under the Bayesian approach is to select the model with highest posterior probability. The problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model. We propose to carry out this optimization using simulated annealing and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient. Theoretical justifications are provided and applications to high dimensional datasets are discussed. The proposed method is implemented in an R package sahpm for general use and is made available on R CRAN.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80490988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models 基于算法的交叉与干扰模型的最优高效精确实验设计
The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds41
S. Hao, Min Yang, Weiwei Zheng
{"title":"Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models","authors":"S. Hao, Min Yang, Weiwei Zheng","doi":"10.51387/23-nejsds41","DOIUrl":"https://doi.org/10.51387/23-nejsds41","url":null,"abstract":"The crossover models and interference models are frequently used in clinical trials, agriculture studies, social studies, etc. While some theoretical optimality results are available, it is still challenging to apply these results in practice. The available theoretical results, due to the complexity of exact optimal designs, typically require some specific combinations of the number of treatments (t), periods (p), and subjects (n). A more flexible method is to build integer programming based on theories in approximate design theory, which can handle general cases of $(t,p,n)$. Nonetheless, those results are generally derived for specific models or design problems and new efforts are needed for new problems. These obstacles make the application of the theoretical results rather difficult. Here we propose a new algorithm, a revision of the optimal weight exchange algorithm by [1]. It provides efficient crossover designs quickly under various situations, for different optimality criteria, different parameters of interest, different configurations of $(t,p,n)$, as well as arbitrary dropout scenarios. To facilitate the usage of our algorithm, the corresponding R package and an R Shiny app as a more user-friendly interface has been developed.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87236279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信