利用经典设计和现代计算的高维随机化推理。

Q1 Mathematics
Marie-Abele C Bind, D B Rubin
{"title":"利用经典设计和现代计算的高维随机化推理。","authors":"Marie-Abele C Bind,&nbsp;D B Rubin","doi":"10.1007/s41237-022-00183-x","DOIUrl":null,"url":null,"abstract":"<p><p>A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic <i>p</i>-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact <i>p</i>-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic <i>p</i>-values or meaningless thresholds for \"significance\" is inapposite in general.</p>","PeriodicalId":39640,"journal":{"name":"Behaviormetrika","volume":"50 1","pages":"9-26"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849196/pdf/","citationCount":"1","resultStr":"{\"title\":\"High-dimensional randomization-based inference capitalizing on classical design and modern computing.\",\"authors\":\"Marie-Abele C Bind,&nbsp;D B Rubin\",\"doi\":\"10.1007/s41237-022-00183-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic <i>p</i>-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact <i>p</i>-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic <i>p</i>-values or meaningless thresholds for \\\"significance\\\" is inapposite in general.</p>\",\"PeriodicalId\":39640,\"journal\":{\"name\":\"Behaviormetrika\",\"volume\":\"50 1\",\"pages\":\"9-26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849196/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behaviormetrika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41237-022-00183-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behaviormetrika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41237-022-00183-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 1

摘要

高维数据分析中可能出现的一个常见并发症是重复使用假设检验。第二个复杂问题,特别是在小样本情况下,是对渐近p值的依赖。我们提出的解决这两种复杂性的方法使用科学动机的标量汇总统计,虽然不是完全新颖,但似乎很少使用。该方法通过对17名参与者的交叉研究来说明,该研究检查了暴露于臭氧和清洁空气对DNA甲基组的影响,其中多变量结果涉及484,531个基因组位置。我们建议的测试产生一个单一的零随机化分布,因此,无论数据的结构如何,都有一个统计有效的fisher精确p值。然而,结果检验的相关性和威力需要谨慎地先验地选择单个检验统计量。通常使用渐近p值或无意义阈值来表示“显著性”是不合适的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic p-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact p-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic p-values or meaningless thresholds for "significance" is inapposite in general.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Behaviormetrika
Behaviormetrika Mathematics-Analysis
CiteScore
5.10
自引率
0.00%
发文量
33
期刊介绍: Behaviormetrika is issued twice a year to provide an international forum for new theoretical and empirical quantitative approaches in data science. When Behaviormetrika was launched in 1974, the journal advocated data science, as an interdisciplinary field that included the use of statistical methods to extract meaningful knowledge from data in its various forms: structured or unstructured. Behaviormetrika is the oldest journal addressing the topic of data science. The first editor-in-chief of Behaviormetrika, Dr. Chikio Hayashi, described data science in this way:“Data science is not only a synthetic concept to unify statistics, data analysis, and their related methods; it also comprises its results. Data science is intended to analyze and understand actual phenomena with ‘data.’ In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena using data from a different perspective from the established or traditional theory and method.”  Behaviormetrika is a fully refereed international journal, which publishes original research papers, notes, and review articles. Subject areas suitable for publication include but are not limited to the following methodologies and fields. Methodologies Data scienceMathematical statisticsSurvey methodologiesArtificial intelligence Information theoryMachine learning Knowledge discovery in databases (KDD)Graphical modelsComputer scienceAlgorithms FieldsMedicinePsychologyEducationEconomicsMarketingSocial scienceSociologyPolitical sciencePolicy scienceCognitive scienceBrain science
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信