{"title":"超越奈曼-皮尔逊:E 值可通过数据驱动的阿尔法进行假设检验。","authors":"Peter D Grünwald","doi":"10.1073/pnas.2302098121","DOIUrl":null,"url":null,"abstract":"A standard practice in statistical hypothesis testing is to mention the P-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With P-values, it is not clear how to use an extreme observation (e.g. [Formula: see text]) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post hoc, after observation of the data-thereby providing a handle on \"roving [Formula: see text]'s.\" When Type-II risks are taken into consideration, the only admissible decision rules in the post hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail, whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":null,"pages":null},"PeriodicalIF":9.4000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha.\",\"authors\":\"Peter D Grünwald\",\"doi\":\"10.1073/pnas.2302098121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A standard practice in statistical hypothesis testing is to mention the P-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With P-values, it is not clear how to use an extreme observation (e.g. [Formula: see text]) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post hoc, after observation of the data-thereby providing a handle on \\\"roving [Formula: see text]'s.\\\" When Type-II risks are taken into consideration, the only admissible decision rules in the post hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail, whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.\",\"PeriodicalId\":20548,\"journal\":{\"name\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1073/pnas.2302098121\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2302098121","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
统计假设检验的标准做法是在做出接受/拒绝决定的同时提及 P 值。我们将展示提及 e 值的优势。对于 P 值,如何使用极端观测值(如[公式:见正文])来获得更好的频数决策并不清楚。而使用 e 值则简单明了,因为 e 值在广义的奈曼-皮尔逊(Neyman-Pearson)设置中提供了第一类风险控制,其决策任务(一般损失函数)是在观察数据后临时确定的,因此提供了对 "巡回[公式:见正文]"的处理方法。当考虑到第二类风险时,事后设置中唯一可接受的决策规则就变成了基于电子值的决策规则。同样,如果指定一个错误的置信区间所造成的损失没有预先确定,那么标准置信区间和分布可能会失效,而电子置信集和电子阶后值仍能提供有效的风险保证。现在,我们已经为一系列经典测试问题开发出了足够强大的电子值。我们将讨论更广泛的开发和应用所面临的主要挑战。
Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha.
A standard practice in statistical hypothesis testing is to mention the P-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With P-values, it is not clear how to use an extreme observation (e.g. [Formula: see text]) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post hoc, after observation of the data-thereby providing a handle on "roving [Formula: see text]'s." When Type-II risks are taken into consideration, the only admissible decision rules in the post hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail, whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.
期刊介绍:
The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.