{"title":"基于内核的快速准确独立性测试,适用于高维数据和函数数据","authors":"Jin-Ting Zhang , Tianming Zhu","doi":"10.1016/j.jmva.2024.105320","DOIUrl":null,"url":null,"abstract":"<div><p>Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert–Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root-<span><math><mi>n</mi></math></span> consistent. A three-cumulant matched chi-squared-approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fast and accurate kernel-based independence test with applications to high-dimensional and functional data\",\"authors\":\"Jin-Ting Zhang , Tianming Zhu\",\"doi\":\"10.1016/j.jmva.2024.105320\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert–Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root-<span><math><mi>n</mi></math></span> consistent. A three-cumulant matched chi-squared-approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.</p></div>\",\"PeriodicalId\":16431,\"journal\":{\"name\":\"Journal of Multivariate Analysis\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Multivariate Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0047259X24000277\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multivariate Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0047259X24000277","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
摘要
测试两个随机变量之间的依赖关系是统计学中的一个重要推断问题,因为许多统计程序都依赖于两个样本是独立的这一假设。为了检验两个样本是否独立,有人提出了基于 HSIC(希尔伯特-施密特独立准则)的检验。其空分布可以用 permutation 或 Gamma 近似值来近似。本文提出了一种新的基于 HSIC 的检验。建立了它的渐近零分布和替代分布。结果表明,所提出的检验是根 n 一致的。本文采用了三积匹配卡方近似法来近似检验统计量的零分布。通过选择适当的重现核,所提出的检验可以应用于多种不同类型的数据,包括多变量、高维和函数数据。三项模拟研究和两项真实数据应用表明,在水平精度、功率和计算成本方面,所提出的检验方法优于现有的几种多变量、高维和函数数据检验方法。
A fast and accurate kernel-based independence test with applications to high-dimensional and functional data
Testing the dependency between two random variables is an important inference problem in statistics since many statistical procedures rely on the assumption that the two samples are independent. To test whether two samples are independent, a so-called HSIC (Hilbert–Schmidt Independence Criterion)-based test has been proposed. Its null distribution is approximated either by permutation or a Gamma approximation. In this paper, a new HSIC-based test is proposed. Its asymptotic null and alternative distributions are established. It is shown that the proposed test is root- consistent. A three-cumulant matched chi-squared-approximation is adopted to approximate the null distribution of the test statistic. By choosing a proper reproducing kernel, the proposed test can be applied to many different types of data including multivariate, high-dimensional, and functional data. Three simulation studies and two real data applications show that in terms of level accuracy, power, and computational cost, the proposed test outperforms several existing tests for multivariate, high-dimensional, and functional data.
期刊介绍:
Founded in 1971, the Journal of Multivariate Analysis (JMVA) is the central venue for the publication of new, relevant methodology and particularly innovative applications pertaining to the analysis and interpretation of multidimensional data.
The journal welcomes contributions to all aspects of multivariate data analysis and modeling, including cluster analysis, discriminant analysis, factor analysis, and multidimensional continuous or discrete distribution theory. Topics of current interest include, but are not limited to, inferential aspects of
Copula modeling
Functional data analysis
Graphical modeling
High-dimensional data analysis
Image analysis
Multivariate extreme-value theory
Sparse modeling
Spatial statistics.