Simpson's aggregation paradox in nonparametric statistical analysis: Theory, computation, and susceptibility in public health data

IF 1.3 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
S. Sanders, J. Ehrlich, James W. Boudreau
{"title":"Simpson's aggregation paradox in nonparametric statistical analysis: Theory, computation, and susceptibility in public health data","authors":"S. Sanders, J. Ehrlich, James W. Boudreau","doi":"10.3389/fams.2023.1169164","DOIUrl":null,"url":null,"abstract":"This study establishes sufficient conditions for observing instances of Simpson's (data aggregation) Paradox under rank sum scoring (RSS), as used, e.g., in the Wilcoxon-Mann-Whitney (WMW) rank sum test. The WMW test is a primary nonparametric statistical test in FDA drug product evaluation and other prominent medical settings. Using computational nonparametric statistical methods, we also establish the relative frequency with which paradox-generating Simpson Reversals occur under RSS when an initial data sequence is pooled with its ordinal replicate. For each 2-sample, n-element per sample or 2 x n case of RSS considered, strict Reversals occurred for between 0% and 1.74% of data poolings across the whole sample space, roughly similar to that observed for 2 x 2 x 2 contingency tables and considerably less than that observed for path models. The Reversal rate conditional on observed initial sequence is highly variable. Despite a mode at 0%, this rate exceeds 20% for some initial sequences. Our empirical application identifies clusters of Simpson Reversal susceptibility for publicly-released mobile phone radiofrequency exposure data. Simpson Reversals under RSS are not simply a theoretical concern but can reverse nonparametric or parametric biostatistical results even in vitally important public health settings. Conceptually, Paradox incidence can be viewed as a robustness check on a given WMW statistical test result. When an instance of Paradox occurs, results constituting this instance are found to be data-scale dependent. Given that the rate of Reversal can vary substantially by initial sequence, the practice of calculating this rate conditional on observed initial sequence represents a potentially important robustness check upon a result.","PeriodicalId":36662,"journal":{"name":"Frontiers in Applied Mathematics and Statistics","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Applied Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fams.2023.1169164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

This study establishes sufficient conditions for observing instances of Simpson's (data aggregation) Paradox under rank sum scoring (RSS), as used, e.g., in the Wilcoxon-Mann-Whitney (WMW) rank sum test. The WMW test is a primary nonparametric statistical test in FDA drug product evaluation and other prominent medical settings. Using computational nonparametric statistical methods, we also establish the relative frequency with which paradox-generating Simpson Reversals occur under RSS when an initial data sequence is pooled with its ordinal replicate. For each 2-sample, n-element per sample or 2 x n case of RSS considered, strict Reversals occurred for between 0% and 1.74% of data poolings across the whole sample space, roughly similar to that observed for 2 x 2 x 2 contingency tables and considerably less than that observed for path models. The Reversal rate conditional on observed initial sequence is highly variable. Despite a mode at 0%, this rate exceeds 20% for some initial sequences. Our empirical application identifies clusters of Simpson Reversal susceptibility for publicly-released mobile phone radiofrequency exposure data. Simpson Reversals under RSS are not simply a theoretical concern but can reverse nonparametric or parametric biostatistical results even in vitally important public health settings. Conceptually, Paradox incidence can be viewed as a robustness check on a given WMW statistical test result. When an instance of Paradox occurs, results constituting this instance are found to be data-scale dependent. Given that the rate of Reversal can vary substantially by initial sequence, the practice of calculating this rate conditional on observed initial sequence represents a potentially important robustness check upon a result.
非参数统计分析中的辛普森聚合悖论:公共卫生数据的理论、计算和易感性
这项研究为在秩和得分(RSS)下观察Simpson(数据聚合)悖论的实例建立了充分的条件,例如在Wilcoxon-Mann-Whitney(WMW)秩和检验中使用。WMW检验是美国食品药品监督管理局药品评估和其他重要医疗环境中的主要非参数统计检验。使用计算非参数统计方法,我们还建立了当初始数据序列与其有序复制合并时,在RSS下产生悖论的Simpson反转发生的相对频率。对于每个2个样本、每个样本n个元素或所考虑的RSS的2 x n种情况,在整个样本空间中,0%至1.74%的数据池发生了严格的反转,与2 x 2 x 2列联表观察到的情况大致相似,但远低于路径模型观察到的情况。以观察到的初始序列为条件的逆转率变化很大。尽管模式为0%,但对于一些初始序列,该比率超过20%。我们的经验应用程序确定了公开发布的手机射频暴露数据的Simpson逆转易感性集群。RSS下的Simpson反转不仅仅是一个理论问题,甚至在至关重要的公共卫生环境中也可以反转非参数或参数生物统计学结果。从概念上讲,Paradox发生率可以被视为对给定WMW统计测试结果的稳健性检查。当Paradox的一个实例发生时,组成该实例的结果被发现是数据规模相关的。鉴于逆转率可能因初始序列而异,以观察到的初始序列为条件计算逆转率的做法代表了对结果的潜在重要稳健性检查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Applied Mathematics and Statistics
Frontiers in Applied Mathematics and Statistics Mathematics-Statistics and Probability
CiteScore
1.90
自引率
7.10%
发文量
117
审稿时长
14 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信