高维数据的双样本Behrens-Fisher问题:一个正常的参考f型检验

IF 1.4 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics Pub Date : 2023-11-24 DOI:10.1007/s00180-023-01433-6

Tianming Zhu, Pengfei Wang, Jin-Ting Zhang

{"title":"高维数据的双样本Behrens-Fisher问题:一个正常的参考f型检验","authors":"Tianming Zhu, Pengfei Wang, Jin-Ting Zhang","doi":"10.1007/s00180-023-01433-6","DOIUrl":null,"url":null,"abstract":"The problem of testing the equality of mean vectors for high-dimensional data has been intensively investigated in the literature. However, most of the existing tests impose strong assumptions on the underlying group covariance matrices which may not be satisfied or hardly be checked in practice. In this article, an F-type test for two-sample Behrens–Fisher problems for high-dimensional data is proposed and studied. When the two samples are normally distributed and when the null hypothesis is valid, the proposed F-type test statistic is shown to be an F-type mixture, a ratio of two independent \\(\\chi ^2\\)-type mixtures. Under some regularity conditions and the null hypothesis, it is shown that the proposed F-type test statistic and the above F-type mixture have the same normal and non-normal limits. It is then justified to approximate the null distribution of the proposed F-type test statistic by that of the F-type mixture, resulting in the so-called normal reference F-type test. Since the F-type mixture is a ratio of two independent \\(\\chi ^2\\)-type mixtures, we employ the Welch–Satterthwaite \\(\\chi ^2\\)-approximation to the distributions of the numerator and the denominator of the F-type mixture respectively, resulting in an approximation F-distribution whose degrees of freedom can be consistently estimated from the data. The asymptotic power of the proposed F-type test is established. Two simulation studies are conducted and they show that in terms of size control, the proposed F-type test outperforms two existing competitors. The good performance of the proposed F-type test is also illustrated by a COVID-19 data example.","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test\",\"authors\":\"Tianming Zhu, Pengfei Wang, Jin-Ting Zhang\",\"doi\":\"10.1007/s00180-023-01433-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of testing the equality of mean vectors for high-dimensional data has been intensively investigated in the literature. However, most of the existing tests impose strong assumptions on the underlying group covariance matrices which may not be satisfied or hardly be checked in practice. In this article, an F-type test for two-sample Behrens–Fisher problems for high-dimensional data is proposed and studied. When the two samples are normally distributed and when the null hypothesis is valid, the proposed F-type test statistic is shown to be an F-type mixture, a ratio of two independent \\\\(\\\\chi ^2\\\\)-type mixtures. Under some regularity conditions and the null hypothesis, it is shown that the proposed F-type test statistic and the above F-type mixture have the same normal and non-normal limits. It is then justified to approximate the null distribution of the proposed F-type test statistic by that of the F-type mixture, resulting in the so-called normal reference F-type test. Since the F-type mixture is a ratio of two independent \\\\(\\\\chi ^2\\\\)-type mixtures, we employ the Welch–Satterthwaite \\\\(\\\\chi ^2\\\\)-approximation to the distributions of the numerator and the denominator of the F-type mixture respectively, resulting in an approximation F-distribution whose degrees of freedom can be consistently estimated from the data. The asymptotic power of the proposed F-type test is established. Two simulation studies are conducted and they show that in terms of size control, the proposed F-type test outperforms two existing competitors. The good performance of the proposed F-type test is also illustrated by a COVID-19 data example.\",\"PeriodicalId\":55223,\"journal\":{\"name\":\"Computational Statistics\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s00180-023-01433-6\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00180-023-01433-6","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

对高维数据的平均向量的相等性的检验问题在文献中得到了深入的研究。然而，现有的大多数检验都对潜在的群体协方差矩阵施加了很强的假设，这些假设在实践中可能不被满足或很难被检验。本文提出并研究了高维数据下双样本Behrens-Fisher问题的f型检验。当两个样本呈正态分布且零假设有效时，所提出的f型检验统计量显示为f型混合物，即两个独立\(\chi ^2\)型混合物的比率。在某些正则性条件和原假设下，证明了所提出的f型检验统计量和上述f型混合物具有相同的正态和非正态极限。然后可以通过f型混合统计量来近似所提出的f型检验统计量的零分布，从而得到所谓的正态参考f型检验。由于f型混合物是两个独立的\(\chi ^2\)型混合物的比率，我们分别对f型混合物的分子和分母的分布采用Welch-Satterthwaite \(\chi ^2\) -近似，从而得到一个近似的f -分布，其自由度可以从数据中一致地估计出来。建立了所提出的f型检验的渐近幂。进行了两次仿真研究，结果表明，在尺寸控制方面，所提出的f型测试优于现有的两个竞争对手。通过一个COVID-19数据实例验证了所提出的f型检验的良好性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test

The problem of testing the equality of mean vectors for high-dimensional data has been intensively investigated in the literature. However, most of the existing tests impose strong assumptions on the underlying group covariance matrices which may not be satisfied or hardly be checked in practice. In this article, an F-type test for two-sample Behrens–Fisher problems for high-dimensional data is proposed and studied. When the two samples are normally distributed and when the null hypothesis is valid, the proposed F-type test statistic is shown to be an F-type mixture, a ratio of two independent \(\chi ^2\)-type mixtures. Under some regularity conditions and the null hypothesis, it is shown that the proposed F-type test statistic and the above F-type mixture have the same normal and non-normal limits. It is then justified to approximate the null distribution of the proposed F-type test statistic by that of the F-type mixture, resulting in the so-called normal reference F-type test. Since the F-type mixture is a ratio of two independent \(\chi ^2\)-type mixtures, we employ the Welch–Satterthwaite \(\chi ^2\)-approximation to the distributions of the numerator and the denominator of the F-type mixture respectively, resulting in an approximation F-distribution whose degrees of freedom can be consistently estimated from the data. The asymptotic power of the proposed F-type test is established. Two simulation studies are conducted and they show that in terms of size control, the proposed F-type test outperforms two existing competitors. The good performance of the proposed F-type test is also illustrated by a COVID-19 data example.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Statistics 数学-统计学与概率论

CiteScore

2.90

自引率

0.00%

发文量

122

审稿时长

>12 weeks

期刊介绍： Computational Statistics (CompStat) is an international journal which promotes the publication of applications and methodological research in the field of Computational Statistics. The focus of papers in CompStat is on the contribution to and influence of computing on statistics and vice versa. The journal provides a forum for computer scientists, mathematicians, and statisticians in a variety of fields of statistics such as biometrics, econometrics, data analysis, graphics, simulation, algorithms, knowledge based systems, and Bayesian computing. CompStat publishes hardware, software plus package reports.