Statistical Inference for Data Adaptive Target Parameters.

IF 1 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Alan E Hubbard, Sara Kherad-Pajouh, Mark J van der Laan
{"title":"Statistical Inference for Data Adaptive Target Parameters.","authors":"Alan E Hubbard, Sara Kherad-Pajouh, Mark J van der Laan","doi":"10.1515/ijb-2015-0013","DOIUrl":null,"url":null,"abstract":"Abstract Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming “data-driven”, the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"3-19"},"PeriodicalIF":1.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0013","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2015-0013","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 62

Abstract

Abstract Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming “data-driven”, the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
数据自适应目标参数的统计推断。
假设观察到一个随机变量的n个副本,其概率分布已知是特定统计模型的一个元素。为了定义我们的统计目标,我们将样本划分为V个大小相等的子样本,并使用该划分在估计样本(V个子样本之一)和相应的互补参数生成样本中定义V个分裂。对于每个V参数生成样本,我们应用一种将样本映射到统计目标参数的算法。我们将样本分割数据自适应统计目标参数定义为这些v样本特定目标参数的平均值。给出了这类数据自适应目标参数的一个估计量(以及相应的中心极限定理)。这种生成数据自适应目标参数的一般方法通过一些实际例子进行了演示,这些例子突出了从数据中进行统计学习的新机会。这一新框架为同一数据内的探索性和验证性分析提供了严格的统计方法。考虑到越来越多的研究正变得“数据驱动”,本文中发展的理论为更多地参与统计推断问题提供了新的动力,这些问题越来越多地由聪明的,但特别的模式发现方法来解决。为了表明这种潜力,并验证理论的预测,展示了广泛的模拟研究,以及基于自适应确定的干预规则的数据分析,并深入了解如何构建这种方法。结果表明,数据自适应目标参数方法为数据驱动科学提供了一个通用的框架和结果方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Biostatistics
International Journal of Biostatistics MATHEMATICAL & COMPUTATIONAL BIOLOGY-STATISTICS & PROBABILITY
CiteScore
2.10
自引率
8.30%
发文量
28
审稿时长
>12 weeks
期刊介绍: The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信