Association Between Nominal Categorical Variables: New Measure Formulation Based on Metric Distances and Value Validity

IF 0.6 Q4 STATISTICS & PROBABILITY
Tarald O. Kvålseth
{"title":"Association Between Nominal Categorical Variables: New Measure Formulation Based on Metric Distances and Value Validity","authors":"Tarald O. Kvålseth","doi":"10.1007/s42519-023-00344-5","DOIUrl":null,"url":null,"abstract":"Abstract When dealing with nominal categorical data, it is often desirable to know the degree of association or dependence between the categorical variables. While there is literally no limit to the number of alternative association measures that have been proposed over the years, they all yield greatly varying, contradictory, and unreliable results due to their lack of an important property: value validity. After discussing the value-validity property, this paper introduces a new measure of association (dependence) based on the mean Euclidean distance between probability distributions, one being a distribution under independence. Both the asymmetric form, when one variable can be considered as the explanatory (independent) variable and one as the response (dependent) variable, and the symmetric form of the measure are introduced. Particular emphasis is given to the important 2 × 2 case when each variable has two categories, but the general case of any number of categories is also covered. Besides having the value-validity property, the new measure has all the prerequisites of a good association measure. Comparisons are made with the well-known Goodman–Kruskal lambda and tau measures. Statistical inference procedure for the new measure is also derived and numerical examples are provided.","PeriodicalId":45853,"journal":{"name":"Journal of Statistical Theory and Practice","volume":"76 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Theory and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42519-023-00344-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract When dealing with nominal categorical data, it is often desirable to know the degree of association or dependence between the categorical variables. While there is literally no limit to the number of alternative association measures that have been proposed over the years, they all yield greatly varying, contradictory, and unreliable results due to their lack of an important property: value validity. After discussing the value-validity property, this paper introduces a new measure of association (dependence) based on the mean Euclidean distance between probability distributions, one being a distribution under independence. Both the asymmetric form, when one variable can be considered as the explanatory (independent) variable and one as the response (dependent) variable, and the symmetric form of the measure are introduced. Particular emphasis is given to the important 2 × 2 case when each variable has two categories, but the general case of any number of categories is also covered. Besides having the value-validity property, the new measure has all the prerequisites of a good association measure. Comparisons are made with the well-known Goodman–Kruskal lambda and tau measures. Statistical inference procedure for the new measure is also derived and numerical examples are provided.
名义分类变量之间的关联:基于度量距离和值有效性的新度量公式
当处理名义分类数据时,通常希望知道分类变量之间的关联或依赖程度。虽然多年来提出的替代关联度量的数量实际上没有限制,但由于它们缺乏一个重要的属性:值有效性,它们都产生了非常不同、矛盾和不可靠的结果。在讨论了值效性的基础上,提出了一种基于概率分布之间的平均欧氏距离的关联(依赖性)度量方法,其中一种是独立分布。介绍了一个变量可以作为解释变量(自变量),另一个变量可以作为响应变量(因变量)的非对称形式和测量的对称形式。当每个变量有两个类别时,特别强调了重要的2 × 2情况,但也涵盖了任何数量的类别的一般情况。新测度除了具有值效性外,还具备了一个好的关联测度所必须具备的条件。与著名的Goodman-Kruskal lambda和tau测度进行了比较。推导了新测度的统计推理过程,并给出了数值算例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Statistical Theory and Practice
Journal of Statistical Theory and Practice STATISTICS & PROBABILITY-
CiteScore
1.40
自引率
0.00%
发文量
74
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信