The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems

IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY
Justin M. Leach, Nengjun Yi, Inmaculada Aban, None The Alzheimer's Disease Neuroimaging Initiative
{"title":"The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems","authors":"Justin M. Leach, Nengjun Yi, Inmaculada Aban, None The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1080/02664763.2023.2258301","DOIUrl":null,"url":null,"abstract":"AbstractSpike-and-slab prior distributions are used to impose variable selection in Bayesian regression-style problems with many possible predictors. These priors are a mixture of two zero-centered distributions with differing variances, resulting in different shrinkage levels on parameter estimates based on whether they are relevant to the outcome. The spike-and-slab lasso assigns mixtures of double exponential distributions as priors for the parameters. This framework was initially developed for linear models, later developed for generalized linear models, and shown to perform well in scenarios requiring sparse solutions. Standard formulations of generalized linear models cannot immediately accommodate categorical outcomes with > 2 categories, i.e. multinomial outcomes, and require modifications to model specification and parameter estimation. Such modifications are relatively straightforward in a Classical setting but require additional theoretical and computational considerations in Bayesian settings, which can depend on the choice of prior distributions for the parameters of interest. While previous developments of the spike-and-slab lasso focused on continuous, count, and/or binary outcomes, we generalize the spike-and-slab lasso to accommodate multinomial outcomes, developing both the theoretical basis for the model and an expectation-maximization algorithm to fit the model. To our knowledge, this is the first generalization of the spike-and-slab lasso to allow for multinomial outcomes.Keywords: Bayesian variable selectionspike-and-slabgeneralized linear modelsmultinomial outcomeselastic net Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementCode to reproduce the results of the simulation study and data analysis is available on GitHub (https://github.com/jmleach-bst/multinomial_ssnet_analyses). Note that while code for performing analysis on ADNI data is included, the ADNI data sets themselves are not, because we are not authorized to share data from ADNI. Details for access to these data can be found at http://adni.loni.usc.edu/data-samples/access-data/.Additional informationFundingData collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health [grant number U01 AG024904] and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/02664763.2023.2258301","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

AbstractSpike-and-slab prior distributions are used to impose variable selection in Bayesian regression-style problems with many possible predictors. These priors are a mixture of two zero-centered distributions with differing variances, resulting in different shrinkage levels on parameter estimates based on whether they are relevant to the outcome. The spike-and-slab lasso assigns mixtures of double exponential distributions as priors for the parameters. This framework was initially developed for linear models, later developed for generalized linear models, and shown to perform well in scenarios requiring sparse solutions. Standard formulations of generalized linear models cannot immediately accommodate categorical outcomes with > 2 categories, i.e. multinomial outcomes, and require modifications to model specification and parameter estimation. Such modifications are relatively straightforward in a Classical setting but require additional theoretical and computational considerations in Bayesian settings, which can depend on the choice of prior distributions for the parameters of interest. While previous developments of the spike-and-slab lasso focused on continuous, count, and/or binary outcomes, we generalize the spike-and-slab lasso to accommodate multinomial outcomes, developing both the theoretical basis for the model and an expectation-maximization algorithm to fit the model. To our knowledge, this is the first generalization of the spike-and-slab lasso to allow for multinomial outcomes.Keywords: Bayesian variable selectionspike-and-slabgeneralized linear modelsmultinomial outcomeselastic net Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementCode to reproduce the results of the simulation study and data analysis is available on GitHub (https://github.com/jmleach-bst/multinomial_ssnet_analyses). Note that while code for performing analysis on ADNI data is included, the ADNI data sets themselves are not, because we are not authorized to share data from ADNI. Details for access to these data can be found at http://adni.loni.usc.edu/data-samples/access-data/.Additional informationFundingData collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health [grant number U01 AG024904] and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
适应变量选择问题中多项结果的钉板套索和可扩展算法
摘要在具有许多可能的预测因子的贝叶斯回归问题中,使用尖峰-板先验分布来强制变量选择。这些先验是两个具有不同方差的零中心分布的混合,导致基于它们是否与结果相关的参数估计的不同收缩水平。钉板套索将双指数分布的混合作为参数的先验。该框架最初是为线性模型开发的,后来开发用于广义线性模型,并在需要稀疏解决方案的场景中表现良好。广义线性模型的标准公式不能立即适应> 2类的分类结果,即多项结果,需要对模型规范和参数估计进行修改。这种修改在经典设置中相对简单,但在贝叶斯设置中需要额外的理论和计算考虑,这可能取决于感兴趣参数的先验分布的选择。虽然之前的spike-and-slab套索的发展主要集中在连续、计数和/或二进制结果上,但我们将spike-and-slab套索推广到适应多项结果,开发了模型的理论基础和期望最大化算法来拟合模型。据我们所知,这是第一个推广的钉-板套索允许多项结果。关键词:贝叶斯变量选择,尖峰-板广义线性模型,多次结果弹性网披露声明作者未报告潜在的利益冲突。数据可用性statementCode用于重现模拟研究和数据分析的结果,可在GitHub (https://github.com/jmleach-bst/multinomial_ssnet_analyses)上获得。请注意,虽然包含了对ADNI数据进行分析的代码,但没有包含ADNI数据集本身,因为我们没有被授权共享来自ADNI的数据。本项目的数据收集和共享由阿尔茨海默病神经成像倡议(ADNI)(美国国立卫生研究院[拨款号U01 AG024904]和国防部ADNI(国防部奖励号W81XWH-12-2-0012)资助。ADNI由美国国家老龄化研究所、美国国家生物医学成像和生物工程研究所资助,并得到以下机构的慷慨捐助:艾伯维、阿尔茨海默氏症协会;阿尔茨海默病药物发现基金会;Araclon生物技术;BioClinica有限公司;生原体;百时美施贵宝公司;CereSpir有限公司;Cogstate;卫材公司。Elan制药公司;礼来公司;EuroImmun;F. Hoffmann-La Roche Ltd及其附属公司Genentech, Inc;Fujirebio;通用电气医疗集团;Janssen Alzheimer Immunotherapy Research & Development, LLC;强生制药研究与开发有限责任公司;Lumosity;Lundbeck公司,它是一家默克公司,Meso Scale Diagnostics, LLC;NeuroRx研究;Neurotrack技术;诺华制药公司;辉瑞公司;皮拉马尔成像;Servier;武田制药公司;和过渡疗法。加拿大卫生研究院正在提供资金,支持加拿大的ADNI临床站点。私营部门的捐款由国家卫生研究院基金会(www.fnih.org)提供便利。受资助组织是北加州研究与教育研究所,该研究由南加州大学阿尔茨海默病治疗研究所协调。ADNI数据由南加州大学神经成像实验室发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Applied Statistics
Journal of Applied Statistics 数学-统计学与概率论
CiteScore
3.40
自引率
0.00%
发文量
126
审稿时长
6 months
期刊介绍: Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信