The Two Sample Problem for Multiple Categorical Variables

IF 1.2 4区 数学
A. DiRienzo
{"title":"The Two Sample Problem for Multiple Categorical Variables","authors":"A. DiRienzo","doi":"10.2202/1557-4679.1019","DOIUrl":null,"url":null,"abstract":"Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2006-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1019","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.2202/1557-4679.1019","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.
多分类变量的双样本问题
由于以下原因,比较两个大型多变量分布可能会很复杂。首先,一些变量/水平组合可能在组间的流行率上有冗余差异,因为这种差异可以完全用低阶组合来解释。其次,要在组间比较的变量/水平组合的总数非常大,并且可能在计算上令人望而却步。在本文中,对于配对和独立样本情况,提出了一种近似比较方法,以及一种计算效率高的算法,该方法可以估计两个种群之间具有非冗余不同患病率的变量/水平组合集。对于任意数据生成分布,估计包含一个或多个错误或冗余差异的概率在任何预先指定的水平上渐近有界。在模拟研究中,该方法在有限样本中表现良好,并在最近的艾滋病临床试验中用于研究HIV-1基因型进化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Biostatistics
International Journal of Biostatistics Mathematics-Statistics and Probability
CiteScore
2.30
自引率
8.30%
发文量
28
期刊介绍: The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信