{"title":"利用经典设计和现代计算的高维随机化推理。","authors":"Marie-Abele C Bind, D B Rubin","doi":"10.1007/s41237-022-00183-x","DOIUrl":null,"url":null,"abstract":"<p><p>A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic <i>p</i>-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact <i>p</i>-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic <i>p</i>-values or meaningless thresholds for \"significance\" is inapposite in general.</p>","PeriodicalId":39640,"journal":{"name":"Behaviormetrika","volume":"50 1","pages":"9-26"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849196/pdf/","citationCount":"1","resultStr":"{\"title\":\"High-dimensional randomization-based inference capitalizing on classical design and modern computing.\",\"authors\":\"Marie-Abele C Bind, D B Rubin\",\"doi\":\"10.1007/s41237-022-00183-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic <i>p</i>-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact <i>p</i>-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic <i>p</i>-values or meaningless thresholds for \\\"significance\\\" is inapposite in general.</p>\",\"PeriodicalId\":39640,\"journal\":{\"name\":\"Behaviormetrika\",\"volume\":\"50 1\",\"pages\":\"9-26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849196/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behaviormetrika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41237-022-00183-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behaviormetrika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41237-022-00183-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
High-dimensional randomization-based inference capitalizing on classical design and modern computing.
A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic p-values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact p-value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic p-values or meaningless thresholds for "significance" is inapposite in general.
期刊介绍:
Behaviormetrika is issued twice a year to provide an international forum for new theoretical and empirical quantitative approaches in data science. When Behaviormetrika was launched in 1974, the journal advocated data science, as an interdisciplinary field that included the use of statistical methods to extract meaningful knowledge from data in its various forms: structured or unstructured. Behaviormetrika is the oldest journal addressing the topic of data science. The first editor-in-chief of Behaviormetrika, Dr. Chikio Hayashi, described data science in this way:“Data science is not only a synthetic concept to unify statistics, data analysis, and their related methods; it also comprises its results. Data science is intended to analyze and understand actual phenomena with ‘data.’ In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena using data from a different perspective from the established or traditional theory and method.” Behaviormetrika is a fully refereed international journal, which publishes original research papers, notes, and review articles. Subject areas suitable for publication include but are not limited to the following methodologies and fields. Methodologies Data scienceMathematical statisticsSurvey methodologiesArtificial intelligence Information theoryMachine learning Knowledge discovery in databases (KDD)Graphical modelsComputer scienceAlgorithms FieldsMedicinePsychologyEducationEconomicsMarketingSocial scienceSociologyPolitical sciencePolicy scienceCognitive scienceBrain science