Journal of Applied Statistics最新文献

筛选
英文 中文
The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems 适应变量选择问题中多项结果的钉板套索和可扩展算法
4区 数学
Journal of Applied Statistics Pub Date : 2023-09-14 DOI: 10.1080/02664763.2023.2258301
Justin M. Leach, Nengjun Yi, Inmaculada Aban, None The Alzheimer's Disease Neuroimaging Initiative
{"title":"The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems","authors":"Justin M. Leach, Nengjun Yi, Inmaculada Aban, None The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1080/02664763.2023.2258301","DOIUrl":"https://doi.org/10.1080/02664763.2023.2258301","url":null,"abstract":"AbstractSpike-and-slab prior distributions are used to impose variable selection in Bayesian regression-style problems with many possible predictors. These priors are a mixture of two zero-centered distributions with differing variances, resulting in different shrinkage levels on parameter estimates based on whether they are relevant to the outcome. The spike-and-slab lasso assigns mixtures of double exponential distributions as priors for the parameters. This framework was initially developed for linear models, later developed for generalized linear models, and shown to perform well in scenarios requiring sparse solutions. Standard formulations of generalized linear models cannot immediately accommodate categorical outcomes with > 2 categories, i.e. multinomial outcomes, and require modifications to model specification and parameter estimation. Such modifications are relatively straightforward in a Classical setting but require additional theoretical and computational considerations in Bayesian settings, which can depend on the choice of prior distributions for the parameters of interest. While previous developments of the spike-and-slab lasso focused on continuous, count, and/or binary outcomes, we generalize the spike-and-slab lasso to accommodate multinomial outcomes, developing both the theoretical basis for the model and an expectation-maximization algorithm to fit the model. To our knowledge, this is the first generalization of the spike-and-slab lasso to allow for multinomial outcomes.Keywords: Bayesian variable selectionspike-and-slabgeneralized linear modelsmultinomial outcomeselastic net Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementCode to reproduce the results of the simulation study and data analysis is available on GitHub (https://github.com/jmleach-bst/multinomial_ssnet_analyses). Note that while code for performing analysis on ADNI data is included, the ADNI data sets themselves are not, because we are not authorized to share data from ADNI. Details for access to these data can be found at http://adni.loni.usc.edu/data-samples/access-data/.Additional informationFundingData collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health [grant number U01 AG024904] and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Develop","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction and model evaluation for space-time data. 时空数据预测与模型评价
IF 1.2 4区 数学
Journal of Applied Statistics Pub Date : 2023-09-03 eCollection Date: 2024-01-01 DOI: 10.1080/02664763.2023.2252208
G L Watson, C E Reid, M Jerrett, D Telesca
{"title":"Prediction and model evaluation for space-time data.","authors":"G L Watson, C E Reid, M Jerrett, D Telesca","doi":"10.1080/02664763.2023.2252208","DOIUrl":"10.1080/02664763.2023.2252208","url":null,"abstract":"<p><p>Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"1 1","pages":"2007-2024"},"PeriodicalIF":1.2,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41565191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial to the special issue: statistical perspectives on analytics for COVID-19 data. 特刊编辑:COVID-19 数据分析的统计视角。
IF 1.2 4区 数学
Journal of Applied Statistics Pub Date : 2023-07-28 eCollection Date: 2023-01-01 DOI: 10.1080/02664763.2023.2228597
Arnold Stromberg, Jie Chen, Teresa Paula Costa Azinheira Oliveira, Yichuan Zhao, Ramin Moghaddass, Milan Stehlik
{"title":"Editorial to the special issue: statistical perspectives on analytics for COVID-19 data.","authors":"Arnold Stromberg, Jie Chen, Teresa Paula Costa Azinheira Oliveira, Yichuan Zhao, Ramin Moghaddass, Milan Stehlik","doi":"10.1080/02664763.2023.2228597","DOIUrl":"10.1080/02664763.2023.2228597","url":null,"abstract":"","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"50 11-12","pages":"2287-2293"},"PeriodicalIF":1.2,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10388801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10294239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding groups in data: an introduction to cluster analysis 在数据中查找组:聚类分析导论
IF 1.5 4区 数学
Journal of Applied Statistics Pub Date : 2023-06-04 DOI: 10.1080/02664763.2023.2220087
Soumita Modak
{"title":"Finding groups in data: an introduction to cluster analysis","authors":"Soumita Modak","doi":"10.1080/02664763.2023.2220087","DOIUrl":"https://doi.org/10.1080/02664763.2023.2220087","url":null,"abstract":"","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44099860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 275
Combining phenotypic and genomic data to improve prediction of binary traits 结合表型和基因组数据提高二元性状的预测
4区 数学
Journal of Applied Statistics Pub Date : 2023-05-16 DOI: 10.1080/02664763.2023.2208773
D. Jarquin, A. Roy, B. Clarke, S. Ghosal
{"title":"Combining phenotypic and genomic data to improve prediction of binary traits","authors":"D. Jarquin, A. Roy, B. Clarke, S. Ghosal","doi":"10.1080/02664763.2023.2208773","DOIUrl":"https://doi.org/10.1080/02664763.2023.2208773","url":null,"abstract":"Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here ‘main traits’) of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or ‘phenotypes’) that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136020907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knockoff procedure for false discovery rate control in high-dimensional data streams. 用于高维数据流中错误发现率控制的Knockoff过程。
IF 1.5 4区 数学
Journal of Applied Statistics Pub Date : 2023-05-15 eCollection Date: 2023-01-01 DOI: 10.1080/02664763.2023.2200496
Ka Wai Tsang, Fugee Tsung, Zhihao Xu
{"title":"Knockoff procedure for false discovery rate control in high-dimensional data streams.","authors":"Ka Wai Tsang, Fugee Tsung, Zhihao Xu","doi":"10.1080/02664763.2023.2200496","DOIUrl":"10.1080/02664763.2023.2200496","url":null,"abstract":"<p><p>Motivated by applications to root-cause identification of faults in high-dimensional data streams that may have very limited samples after faults are detected, we consider multiple testing in models for multivariate statistical process control (SPC). With quick fault detection, only small portion of data streams being out-of-control (OC) can be assumed. It is a long standing problem to identify those OC data streams while controlling the number of false discoveries. It is challenging due to the limited number of OC samples after the termination of the process when faults are detected. Although several false discovery rate (FDR) controlling methods have been proposed, people may prefer other methods for quick detection. With a recently developed method called Knockoff filtering, we propose a knockoff procedure that can combine with other fault detection methods in the sense that the knockoff procedure does not change the stopping time, but may identify another set of faults to control FDR. A theorem for the FDR control of the proposed procedure is provided. Simulation studies show that the proposed procedure can control FDR while maintaining high power. We also illustrate the performance in an application to semiconductor manufacturing processes that motivated this development.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"50 14","pages":"2970-2983"},"PeriodicalIF":1.5,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557548/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41130200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection and estimation of multiple transient changes. 多个瞬态变化的检测和估计。
IF 1.5 4区 数学
Journal of Applied Statistics Pub Date : 2023-03-13 eCollection Date: 2023-01-01 DOI: 10.1080/02664763.2023.2174257
Michael Baron, Sergey V Malov
{"title":"Detection and estimation of multiple transient changes.","authors":"Michael Baron, Sergey V Malov","doi":"10.1080/02664763.2023.2174257","DOIUrl":"10.1080/02664763.2023.2174257","url":null,"abstract":"<p><p>Change-point detection methods are proposed for the case of temporary failures, or transient changes, when an unexpected disorder is ultimately followed by a re-adjustment and return to the initial state. A base distribution of the 'in-control' state changes to an 'out-of-control' distribution for unknown periods of time. Likelihood based sequential and retrospective tools are proposed for the detection and estimation of each pair of change-points. The accuracy of the obtained change-point estimates is assessed. Proposed methods offer simultaneous control of the familywise false alarm and false re-adjustment rates at the pre-chosen levels.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"50 14","pages":"2862-2888"},"PeriodicalIF":1.5,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41132383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling the spatial patterns of antenatal care utilization in Nigeria with inference based on Pólya-Gamma mixtures. 基于Pólya-Gamma混合物的推断建模尼日利亚产前护理利用的空间模式
IF 1.5 4区 数学
Journal of Applied Statistics Pub Date : 2023-02-23 eCollection Date: 2024-01-01 DOI: 10.1080/02664763.2022.2164561
Osafu Augustine Egbon, Ezra Gayawan
{"title":"Modeling the spatial patterns of antenatal care utilization in Nigeria with inference based on Pólya-Gamma mixtures.","authors":"Osafu Augustine Egbon, Ezra Gayawan","doi":"10.1080/02664763.2022.2164561","DOIUrl":"10.1080/02664763.2022.2164561","url":null,"abstract":"<p><p>Despite the vast advantages of making antenatal care visits, the service utilization among pregnant women in Nigeria is suboptimal. A five-year monitoring estimate indicated that about 24% of the women who had live births made no visit. The non-utilization induced excessive zeroes in the outcome of interest. Thus, this study adopted a zero-inflated negative binomial model within a Bayesian framework to identify the spatial pattern and the key factors hindering antenatal care utilization in Nigeria. We overcome the intractability associated with posterior inference by adopting a Pólya-Gamma data-augmentation technique to facilitate inference. The Gibbs sampling algorithm was used to draw samples from the joint posterior distribution. Results revealed that type of place of residence, maternal level of education, access to mass media, household work index, and woman's working status have significant effects on the use of antenatal care services. Findings identified substantial state-level spatial disparity in antenatal care utilization across the country. Cost-effective techniques to achieve an acceptable frequency of utilization include the creation of a community-specific awareness to emphasize the importance and benefits of the appropriate utilization. Special consideration should be given to older pregnant women, women in poor antenatal utilization states, and women residing in poor road network regions.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"1 1","pages":"866-890"},"PeriodicalIF":1.5,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10956928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41386591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On regime changes of COVID-19 outbreak. 关于 COVID-19 疫情的制度变化。
IF 1.2 4区 数学
Journal of Applied Statistics Pub Date : 2023-02-13 eCollection Date: 2023-01-01 DOI: 10.1080/02664763.2023.2177625
A Tchorbadjieff, L P Tomov, V Velev, G Dezhov, V Manev, P Mayster
{"title":"On regime changes of COVID-19 outbreak.","authors":"A Tchorbadjieff, L P Tomov, V Velev, G Dezhov, V Manev, P Mayster","doi":"10.1080/02664763.2023.2177625","DOIUrl":"10.1080/02664763.2023.2177625","url":null,"abstract":"<p><p>The COVID-19 pandemic has had a very serious impact on societies and caused large-scale economic changes and death toll worldwide. The first cases were detected in China, but soon the virus spread quickly worldwide and the intensity of newly reported infections grew high during this initial period almost everywhere. Later, despite all imposed measures, the intensity shifted abruptly multiple times during the two-year period between 2020 and 2022 causing waves of too high infection rates in almost every part of the world. To target this problem, we assume the data heterogeneity as multiple consecutive regime changes. The research study includes the development of a model based on automatic regime change detection and their combination with the linear birth-death process for long-run data fits. The results are empirically verified on data for 38 countries and US states for the period from February 2020 to April 2022. Finally, the initial phase (conditions) properties of infection development are studied.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"50 11-12","pages":"2343-2359"},"PeriodicalIF":1.2,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10388815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9922918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smoothing regression and impact measures for accidents of traffic flows 交通流事故的平滑回归及影响措施
4区 数学
Journal of Applied Statistics Pub Date : 2023-02-10 DOI: 10.1080/02664763.2023.2175799
Zhou Yu, Jie Yang, Hsin-Hsiung Huang
{"title":"Smoothing regression and impact measures for accidents of traffic flows","authors":"Zhou Yu, Jie Yang, Hsin-Hsiung Huang","doi":"10.1080/02664763.2023.2175799","DOIUrl":"https://doi.org/10.1080/02664763.2023.2175799","url":null,"abstract":"Traffic pattern identification and accident evaluation are essential for improving traffic planning, road safety, and traffic management. In this paper, we establish classification and regression models to characterize the relationship between traffic flows and different time points and identify different patterns of traffic flows by a negative binomial model with smoothing splines. It provides mean response curves and Bayesian credible bands for traffic flows, a single index, and the log-likelihood difference, for traffic flow pattern recognition. We further propose an impact measure for evaluating the influence of accidents on traffic flows based on the fitted negative binomial model. The proposed method has been successfully applied to real-world traffic flows, and it can be used for improving traffic management.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136097116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信