关于分歧与决策P值的评论

IF 1 4区数学 Q3 STATISTICS & PROBABILITY

Scandinavian Journal of Statistics Pub Date : 2023-04-12 DOI:10.1111/sjos.12647

Paul W. Vos

{"title":"关于分歧与决策P值的评论","authors":"Paul W. Vos","doi":"10.1111/sjos.12647","DOIUrl":null,"url":null,"abstract":"The distinction between the two uses of p-values described by Professor Greenland is related to two distinct interpretations of frequentist probability—that is, probability used to describe a random event. I will illustrate with a simple example. In the North Carolina Pick-4 lottery, 10 ping pong balls labeled with distinct digits from I9 = {0, 1,..., 9} are mixed in a clear container and opening a door allows a single ball to be selected. Prior to opening the door, blown air mixes the balls making equally likely selection of each ball plausible. This is repeated with three identical containers to obtain the remaining three digits. If a winning ticket is defined as one where the sum of the four digits exceeds 28, the state can charge $5 for a ticket with a $100 prize and expect a profit. There are 330 of 104 possible outcomes where the sum exceeds 28 so the expected value is 0.033 × $100 = $3.30. This calculation requires no repeated sampling but it is natural for the state to interpret this value in the long run. For an individual ticket holder, all that is required is that each ball is given an equal chance to be selected for the drawing associated with his ticket. The ticket holder does not need to imagine a long sequence of draws just as a cancer patient does not need to consider a long sequence of 5-year periods to understand a 30% 5-year survival. Using terminology from Vos and Holbert (2022), the scope for the ticket holder is specific while that of the state is generic. The uniform distribution on 4-tuples I4 9 = I9 × I9 × I9 × I9 provides a model for repeated draws of the Pick-4 lottery, that is, of the data generation process. For most inference applications, the distribution of an unknown population can be modeled rather than the process that generated the data. We modify this example to consider inference. We are told the sum of a single lottery draw and we are to infer whether the draw came from the NC lottery or lottery A that also has four containers but each contains 8 balls with labels from I7 = {0, 1,..., 7}. The sum of the digits is 29 but no other information is given. A reduction-to-contradiction argument establishes that the result came from the NC lottery. Premise: lottery A produced our data; every possible sum from lottery A belongs to the set {0, 1,..., 28}; 29 is not in this set; conclusion: the contradiction means it is impossible that the premise is true.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"50 1","pages":"920 - 922"},"PeriodicalIF":1.0000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Comments on Divergence vs. Decision P‐values\",\"authors\":\"Paul W. Vos\",\"doi\":\"10.1111/sjos.12647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The distinction between the two uses of p-values described by Professor Greenland is related to two distinct interpretations of frequentist probability—that is, probability used to describe a random event. I will illustrate with a simple example. In the North Carolina Pick-4 lottery, 10 ping pong balls labeled with distinct digits from I9 = {0, 1,..., 9} are mixed in a clear container and opening a door allows a single ball to be selected. Prior to opening the door, blown air mixes the balls making equally likely selection of each ball plausible. This is repeated with three identical containers to obtain the remaining three digits. If a winning ticket is defined as one where the sum of the four digits exceeds 28, the state can charge $5 for a ticket with a $100 prize and expect a profit. There are 330 of 104 possible outcomes where the sum exceeds 28 so the expected value is 0.033 × $100 = $3.30. This calculation requires no repeated sampling but it is natural for the state to interpret this value in the long run. For an individual ticket holder, all that is required is that each ball is given an equal chance to be selected for the drawing associated with his ticket. The ticket holder does not need to imagine a long sequence of draws just as a cancer patient does not need to consider a long sequence of 5-year periods to understand a 30% 5-year survival. Using terminology from Vos and Holbert (2022), the scope for the ticket holder is specific while that of the state is generic. The uniform distribution on 4-tuples I4 9 = I9 × I9 × I9 × I9 provides a model for repeated draws of the Pick-4 lottery, that is, of the data generation process. For most inference applications, the distribution of an unknown population can be modeled rather than the process that generated the data. We modify this example to consider inference. We are told the sum of a single lottery draw and we are to infer whether the draw came from the NC lottery or lottery A that also has four containers but each contains 8 balls with labels from I7 = {0, 1,..., 7}. The sum of the digits is 29 but no other information is given. A reduction-to-contradiction argument establishes that the result came from the NC lottery. Premise: lottery A produced our data; every possible sum from lottery A belongs to the set {0, 1,..., 28}; 29 is not in this set; conclusion: the contradiction means it is impossible that the premise is true.\",\"PeriodicalId\":49567,\"journal\":{\"name\":\"Scandinavian Journal of Statistics\",\"volume\":\"50 1\",\"pages\":\"920 - 922\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2023-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scandinavian Journal of Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1111/sjos.12647\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/sjos.12647","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 3

摘要

Greenland教授描述的p值的两种用法之间的区别与对频率论概率的两种不同解释有关，即用于描述随机事件的概率。我将用一个简单的例子来说明。在北卡罗来纳州的4选4彩票中，10个标有I9={0，1，…，9}不同数字的乒乓球被混合在一个透明的容器中，打开门可以选择一个球。在打开门之前，吹出的空气将球混合在一起，使每个球的选择都同样合理。用三个相同的容器重复此操作，以获得剩余的三个数字。如果中奖彩票被定义为四位数之和超过28的彩票，那么该州可以对一张奖金为100美元的彩票收取5美元的费用，并有望盈利。104个可能结果中有330个结果的总和超过28，因此预期值为0.033×$100=3.30美元。这种计算不需要重复采样，但从长远来看，国家很自然地会解释这个值。对于个人持票人来说，所需要的就是每个球都有平等的机会被选中参加与其持票相关的抽签。持票人不需要想象一个长序列的抽奖，就像癌症患者不需要考虑一个长的5年周期序列来理解30%的5年生存率一样。使用Vos和Holbert（2022）的术语，持票人的范围是特定的，而州的范围是通用的。4元组I4-9=I9×I9×I9×I9*I9上的均匀分布为Pick-4彩票的重复抽奖，即数据生成过程提供了一个模型。对于大多数推理应用程序，可以对未知种群的分布进行建模，而不是对生成数据的过程进行建模。我们修改这个例子以考虑推理。我们被告知单次彩票抽奖的总和，我们将推断抽奖是来自NC彩票还是彩票a，彩票a也有四个容器，但每个容器都包含8个球，标签为I7={0，1，…，7}。数字的总和是29，但没有给出其他信息。一个简化为矛盾的论点证明，结果来自NC彩票。前提：彩票A产生了我们的数据；来自彩票A的每个可能的和属于集合{0，1，…，28}；29不在此集合中；结论：矛盾意味着前提不可能成立。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comments on Divergence vs. Decision P‐values

The distinction between the two uses of p-values described by Professor Greenland is related to two distinct interpretations of frequentist probability—that is, probability used to describe a random event. I will illustrate with a simple example. In the North Carolina Pick-4 lottery, 10 ping pong balls labeled with distinct digits from I9 = {0, 1,..., 9} are mixed in a clear container and opening a door allows a single ball to be selected. Prior to opening the door, blown air mixes the balls making equally likely selection of each ball plausible. This is repeated with three identical containers to obtain the remaining three digits. If a winning ticket is defined as one where the sum of the four digits exceeds 28, the state can charge $5 for a ticket with a $100 prize and expect a profit. There are 330 of 104 possible outcomes where the sum exceeds 28 so the expected value is 0.033 × $100 = $3.30. This calculation requires no repeated sampling but it is natural for the state to interpret this value in the long run. For an individual ticket holder, all that is required is that each ball is given an equal chance to be selected for the drawing associated with his ticket. The ticket holder does not need to imagine a long sequence of draws just as a cancer patient does not need to consider a long sequence of 5-year periods to understand a 30% 5-year survival. Using terminology from Vos and Holbert (2022), the scope for the ticket holder is specific while that of the state is generic. The uniform distribution on 4-tuples I4 9 = I9 × I9 × I9 × I9 provides a model for repeated draws of the Pick-4 lottery, that is, of the data generation process. For most inference applications, the distribution of an unknown population can be modeled rather than the process that generated the data. We modify this example to consider inference. We are told the sum of a single lottery draw and we are to infer whether the draw came from the NC lottery or lottery A that also has four containers but each contains 8 balls with labels from I7 = {0, 1,..., 7}. The sum of the digits is 29 but no other information is given. A reduction-to-contradiction argument establishes that the result came from the NC lottery. Premise: lottery A produced our data; every possible sum from lottery A belongs to the set {0, 1,..., 28}; 29 is not in this set; conclusion: the contradiction means it is impossible that the premise is true.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scandinavian Journal of Statistics 数学-统计学与概率论

CiteScore

1.80

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The Scandinavian Journal of Statistics is internationally recognised as one of the leading statistical journals in the world. It was founded in 1974 by four Scandinavian statistical societies. Today more than eighty per cent of the manuscripts are submitted from outside Scandinavia. It is an international journal devoted to reporting significant and innovative original contributions to statistical methodology, both theory and applications. The journal specializes in statistical modelling showing particular appreciation of the underlying substantive research problems. The emergence of specialized methods for analysing longitudinal and spatial data is just one example of an area of important methodological development in which the Scandinavian Journal of Statistics has a particular niche.