Maximum Projection Gini Correlation (MaGiC) for mixed categorical and numerical data

IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY
Hong Xiao , Radhakrishna Adhikari , Yixin Chen , Xin Dang
{"title":"Maximum Projection Gini Correlation (MaGiC) for mixed categorical and numerical data","authors":"Hong Xiao ,&nbsp;Radhakrishna Adhikari ,&nbsp;Yixin Chen ,&nbsp;Xin Dang","doi":"10.1016/j.jspi.2025.106294","DOIUrl":null,"url":null,"abstract":"<div><div>We propose a projection correlation for measure of dependence between numerical multivariate variables and categorical variables. The projection correlation, defined as the maximum of the Gini correlations (i.e., MaGiC) between the categorical variable and the univariate projections of the multivariate vector, is non-parametric, and intuitively produces a high coefficient when the two variables are dependent, and zero when they are independent. We show that MaGiC possesses the property of nestedness, in that it is non-decreasing with the increasing number of features in the numerical vector, while remaining unchanged if additional numerical features are independent of the categorical variable and original features. We establish <span><math><msqrt><mrow><mi>n</mi></mrow></msqrt></math></span>-consistency of the sample projection correlation. A powerful <span><math><mi>K</mi></math></span>-sample test can be carried out via the MaGiC-based independence test. When compared with related correlation definitions for multivariate variables, MaGiC also enjoys a faster implementation, with the computational complexity <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>m</mi><mi>n</mi><mrow><mo>(</mo><mi>d</mi><mo>+</mo><mo>log</mo><mi>n</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math></span> where <span><math><mi>d</mi></math></span> is the dimension of the numerical variable, <span><math><mi>n</mi></math></span> is the sample size, and <span><math><mi>m</mi></math></span> is the number of projections performed, as opposed to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>d</mi><mspace></mspace><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> for Gini correlation. We demonstrate these properties through simulation and application to real datasets.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"239 ","pages":"Article 106294"},"PeriodicalIF":0.8000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375825000321","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

We propose a projection correlation for measure of dependence between numerical multivariate variables and categorical variables. The projection correlation, defined as the maximum of the Gini correlations (i.e., MaGiC) between the categorical variable and the univariate projections of the multivariate vector, is non-parametric, and intuitively produces a high coefficient when the two variables are dependent, and zero when they are independent. We show that MaGiC possesses the property of nestedness, in that it is non-decreasing with the increasing number of features in the numerical vector, while remaining unchanged if additional numerical features are independent of the categorical variable and original features. We establish n-consistency of the sample projection correlation. A powerful K-sample test can be carried out via the MaGiC-based independence test. When compared with related correlation definitions for multivariate variables, MaGiC also enjoys a faster implementation, with the computational complexity O(mn(d+logn)) where d is the dimension of the numerical variable, n is the sample size, and m is the number of projections performed, as opposed to O(dn2) for Gini correlation. We demonstrate these properties through simulation and application to real datasets.
混合分类和数值数据的最大投影基尼相关(MaGiC)
我们提出了一种投影相关性来衡量数值多元变量和分类变量之间的相关性。投影相关性,定义为类别变量与多元向量的单变量投影之间的基尼相关性(即MaGiC)的最大值,是非参数的,当两个变量相依时直观地产生高系数,当它们独立时产生零系数。我们证明了MaGiC具有嵌套性,即随着数值向量中特征数量的增加,它不减少,而如果附加的数值特征独立于分类变量和原始特征,它保持不变。我们建立了样本投影相关性的n一致性。通过基于magic的独立性检验,可以进行强大的k样本检验。与多元变量的相关关联定义相比,MaGiC的实现速度更快,计算复杂度为O(mn(d+logn)),其中d是数值变量的维度,n是样本量,m是执行的预测数量,而基尼相关的计算复杂度为O(dn2)。我们通过模拟和实际数据集的应用来证明这些特性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Statistical Planning and Inference
Journal of Statistical Planning and Inference 数学-统计学与概率论
CiteScore
2.10
自引率
11.10%
发文量
78
审稿时长
3-6 weeks
期刊介绍: The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists. We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信