多分类变量的双样本问题

IF 1.2 4区数学

International Journal of Biostatistics Pub Date : 2006-01-09 DOI:10.2202/1557-4679.1019

A. DiRienzo

{"title":"多分类变量的双样本问题","authors":"A. DiRienzo","doi":"10.2202/1557-4679.1019","DOIUrl":null,"url":null,"abstract":"Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2006-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1019","citationCount":"0","resultStr":"{\"title\":\"The Two Sample Problem for Multiple Categorical Variables\",\"authors\":\"A. DiRienzo\",\"doi\":\"10.2202/1557-4679.1019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.\",\"PeriodicalId\":50333,\"journal\":{\"name\":\"International Journal of Biostatistics\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2006-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2202/1557-4679.1019\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.2202/1557-4679.1019\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.2202/1557-4679.1019","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于以下原因，比较两个大型多变量分布可能会很复杂。首先，一些变量/水平组合可能在组间的流行率上有冗余差异，因为这种差异可以完全用低阶组合来解释。其次，要在组间比较的变量/水平组合的总数非常大，并且可能在计算上令人望而却步。在本文中，对于配对和独立样本情况，提出了一种近似比较方法，以及一种计算效率高的算法，该方法可以估计两个种群之间具有非冗余不同患病率的变量/水平组合集。对于任意数据生成分布，估计包含一个或多个错误或冗余差异的概率在任何预先指定的水平上渐近有界。在模拟研究中，该方法在有限样本中表现良好，并在最近的艾滋病临床试验中用于研究HIV-1基因型进化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Two Sample Problem for Multiple Categorical Variables

Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Biostatistics Mathematics-Statistics and Probability

CiteScore

2.30

自引率

8.30%

发文量

期刊介绍： The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.