连续数据和分类数据的相关度量比较

E. Skotarczak, A. Dobek, K. Moliński
{"title":"连续数据和分类数据的相关度量比较","authors":"E. Skotarczak, A. Dobek, K. Moliński","doi":"10.2478/bile-2019-0015","DOIUrl":null,"url":null,"abstract":"Summary In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.","PeriodicalId":8933,"journal":{"name":"Biometrical Letters","volume":"1 1","pages":"253 - 261"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Comparison of some correlation measures for continuous and categorical data\",\"authors\":\"E. Skotarczak, A. Dobek, K. Moliński\",\"doi\":\"10.2478/bile-2019-0015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.\",\"PeriodicalId\":8933,\"journal\":{\"name\":\"Biometrical Letters\",\"volume\":\"1 1\",\"pages\":\"253 - 261\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrical Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/bile-2019-0015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrical Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/bile-2019-0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在文献中,可以发现用于不同数据结构的相关系数和关联系数的广泛集合。一般来说,一些相关系数通常用于连续数据,而其他相关系数用于分类或顺序观测。本文的目的是验证对几种类型的观测值进行相关系数估计的各种方法的性能。对模拟数据和实际数据进行了分析。对于连续变量,我们确定了Pearson的r2和MIC,而对于分类数据,我们比较了三种方法:cram s V、Joe的估计器和基于回归的估计器。对连续数据采用了两种离散化方法。结果表明:当假设r2系数较大时,基于回归的方法获得的结果最好,而当假设r2较小时,Joe’s estimator更接近真实相关;MIC估计器检测具有二次关系的数据的最大依赖程度。此外,对具有非线性相关性的数据采用离散化方法会导致相关性信息的丢失。计算由R包规则和minerva支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of some correlation measures for continuous and categorical data
Summary In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信