Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing.

IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY
Annals of Applied Statistics Pub Date : 2022-12-01 Epub Date: 2022-09-26 DOI:10.1214/21-aoas1574
Gabriel Loewinger, Prasad Patil, Kenneth T Kishida, Giovanni Parmigiani
{"title":"Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing.","authors":"Gabriel Loewinger, Prasad Patil, Kenneth T Kishida, Giovanni Parmigiani","doi":"10.1214/21-aoas1574","DOIUrl":null,"url":null,"abstract":"<p><p>We propose the \"study strap ensemble\", which combines advantages of two common approaches to fitting prediction models when multiple training datasets (\"studies\") are available: pooling studies and fitting one model versus averaging predictions from multiple models each fit to individual studies. The study strap ensemble fits models to bootstrapped datasets, or \"pseudo-studies.\" These are generated by resampling from multiple studies with a hierarchical resampling scheme that generalizes the randomized cluster bootstrap. The study strap is controlled by a tuning parameter that determines the proportion of observations to draw from each study. When the parameter is set to its lowest value, each pseudo-study is resampled from only a single study. When it is high, the study strap ignores the multi-study structure and generates pseudo-studies by merging the datasets and drawing observations like a standard bootstrap. We empirically show the optimal tuning value often lies in between, and prove that special cases of the study strap draw the merged dataset and the set of original studies as pseudo-studies. We extend the study strap approach with an ensemble weighting scheme that utilizes information in the distribution of the covariates of the test dataset. Our work is motivated by neuroscience experiments using real-time neurochemical sensing during awake behavior in humans. Current techniques to perform this kind of research require measurements from an electrode placed in the brain during awake neurosurgery and rely on prediction models to estimate neurotransmitter concentrations from the electrical measurements recorded by the electrode. These models are trained by combining multiple datasets that are collected <i>in vitro</i> under heterogeneous conditions in order to promote accuracy of the models when applied to data collected in the brain. A prevailing challenge is deciding how to combine studies or ensemble models trained on different studies to enhance model generalizability. Our methods produce marked improvements in simulations and in this application. All methods are available in the studyStrap CRAN package.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2145-2165"},"PeriodicalIF":1.3000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9586160/pdf/nihms-1800688.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-aoas1574","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

We propose the "study strap ensemble", which combines advantages of two common approaches to fitting prediction models when multiple training datasets ("studies") are available: pooling studies and fitting one model versus averaging predictions from multiple models each fit to individual studies. The study strap ensemble fits models to bootstrapped datasets, or "pseudo-studies." These are generated by resampling from multiple studies with a hierarchical resampling scheme that generalizes the randomized cluster bootstrap. The study strap is controlled by a tuning parameter that determines the proportion of observations to draw from each study. When the parameter is set to its lowest value, each pseudo-study is resampled from only a single study. When it is high, the study strap ignores the multi-study structure and generates pseudo-studies by merging the datasets and drawing observations like a standard bootstrap. We empirically show the optimal tuning value often lies in between, and prove that special cases of the study strap draw the merged dataset and the set of original studies as pseudo-studies. We extend the study strap approach with an ensemble weighting scheme that utilizes information in the distribution of the covariates of the test dataset. Our work is motivated by neuroscience experiments using real-time neurochemical sensing during awake behavior in humans. Current techniques to perform this kind of research require measurements from an electrode placed in the brain during awake neurosurgery and rely on prediction models to estimate neurotransmitter concentrations from the electrical measurements recorded by the electrode. These models are trained by combining multiple datasets that are collected in vitro under heterogeneous conditions in order to promote accuracy of the models when applied to data collected in the brain. A prevailing challenge is deciding how to combine studies or ensemble models trained on different studies to enhance model generalizability. Our methods produce marked improvements in simulations and in this application. All methods are available in the studyStrap CRAN package.

多研究预测中的分级重采样(Hierarchical resampling for bagging),应用于人类神经化学传感。
我们提出了 "研究带集合",它结合了在有多个训练数据集("研究")的情况下拟合预测模型的两种常用方法的优点:集合研究和拟合一个模型与平均每个研究拟合的多个模型的预测结果。研究带集合拟合模型适用于自引导数据集或 "伪研究"。这些数据集是通过对多项研究进行重采样产生的,重采样方案采用了分层重采样方法,对随机分组自举法进行了推广。研究带由一个调整参数控制,该参数决定了从每项研究中抽取观察值的比例。当参数设置为最低值时,每个伪研究只从单个研究中进行重采样。当参数值较高时,研究表带会忽略多研究结构,通过合并数据集生成伪研究,并像标准自举法一样抽取观察值。我们的经验表明,最佳调整值往往介于两者之间,并证明了研究带的特殊情况是将合并数据集和原始研究集作为伪研究。我们通过利用测试数据集协变量分布信息的集合加权方案扩展了研究带方法。我们的工作源于在人类清醒行为中使用实时神经化学传感的神经科学实验。目前进行此类研究的技术需要在清醒神经外科手术过程中通过放置在大脑中的电极进行测量,并依靠预测模型从电极记录的电测量值估算神经递质浓度。这些模型的训练方法是将在体外不同条件下收集的多个数据集结合起来,以提高模型应用于大脑中收集的数据时的准确性。一个普遍存在的挑战是决定如何将不同研究或在不同研究中训练的集合模型结合起来,以提高模型的通用性。我们的方法在模拟和应用方面都有明显的改进。所有方法都可以在 studyStrap CRAN 软件包中找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Applied Statistics
Annals of Applied Statistics 社会科学-统计学与概率论
CiteScore
3.10
自引率
5.60%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信