利用经典设计和现代计算的高维随机化推理。

Q1 Mathematics

Behaviormetrika Pub Date : 2023-01-01 DOI:10.1007/s41237-022-00183-x

Marie-Abele C Bind, D B Rubin

{"title":"利用经典设计和现代计算的高维随机化推理。","authors":"Marie-Abele C Bind, D B Rubin","doi":"10.1007/s41237-022-00183-x","DOIUrl":null,"url":null,"abstract":"A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic p-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact p-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic p-values or meaningless thresholds for \"significance\" is inapposite in general.","PeriodicalId":39640,"journal":{"name":"Behaviormetrika","volume":"50 1","pages":"9-26"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849196/pdf/","citationCount":"1","resultStr":"{\"title\":\"High-dimensional randomization-based inference capitalizing on classical design and modern computing.\",\"authors\":\"Marie-Abele C Bind, D B Rubin\",\"doi\":\"10.1007/s41237-022-00183-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic p-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact p-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic p-values or meaningless thresholds for \\\"significance\\\" is inapposite in general.\",\"PeriodicalId\":39640,\"journal\":{\"name\":\"Behaviormetrika\",\"volume\":\"50 1\",\"pages\":\"9-26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849196/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behaviormetrika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41237-022-00183-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behaviormetrika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41237-022-00183-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 1

摘要

高维数据分析中可能出现的一个常见并发症是重复使用假设检验。第二个复杂问题，特别是在小样本情况下，是对渐近p值的依赖。我们提出的解决这两种复杂性的方法使用科学动机的标量汇总统计，虽然不是完全新颖，但似乎很少使用。该方法通过对17名参与者的交叉研究来说明，该研究检查了暴露于臭氧和清洁空气对DNA甲基组的影响，其中多变量结果涉及484,531个基因组位置。我们建议的测试产生一个单一的零随机化分布，因此，无论数据的结构如何，都有一个统计有效的fisher精确p值。然而，结果检验的相关性和威力需要谨慎地先验地选择单个检验统计量。通常使用渐近p值或无意义阈值来表示“显著性”是不合适的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

查看原文本刊更多论文

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic p-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact p-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic p-values or meaningless thresholds for "significance" is inapposite in general.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Behaviormetrika Mathematics-Analysis

CiteScore

5.10

自引率

0.00%

发文量

期刊介绍： Behaviormetrika is issued twice a year to provide an international forum for new theoretical and empirical quantitative approaches in data science. When Behaviormetrika was launched in 1974, the journal advocated data science, as an interdisciplinary field that included the use of statistical methods to extract meaningful knowledge from data in its various forms: structured or unstructured. Behaviormetrika is the oldest journal addressing the topic of data science. The first editor-in-chief of Behaviormetrika, Dr. Chikio Hayashi, described data science in this way:“Data science is not only a synthetic concept to unify statistics, data analysis, and their related methods; it also comprises its results. Data science is intended to analyze and understand actual phenomena with ‘data.’ In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena using data from a different perspective from the established or traditional theory and method.” Behaviormetrika is a fully refereed international journal, which publishes original research papers, notes, and review articles. Subject areas suitable for publication include but are not limited to the following methodologies and fields. Methodologies Data scienceMathematical statisticsSurvey methodologiesArtificial intelligence Information theoryMachine learning Knowledge discovery in databases (KDD)Graphical modelsComputer scienceAlgorithms FieldsMedicinePsychologyEducationEconomicsMarketingSocial scienceSociologyPolitical sciencePolicy scienceCognitive scienceBrain science