Symbolic covariance matrix for interval-valued variables and its application to principal component analysis

K. Kosmelj, J. Le-Rademacher, L. Billard
{"title":"Symbolic covariance matrix for interval-valued variables and its application to principal component analysis","authors":"K. Kosmelj, J. Le-Rademacher, L. Billard","doi":"10.51936/nbpe2127","DOIUrl":null,"url":null,"abstract":"In the last two decades, principal component analysis (PCA) was extended to interval-valued data; several adaptations of the classical approach are known from the literature. Our approach is based on the symbolic covariance matrix Cov for the interval-valued variables proposed by Billard (2008). Its crucial advantage, when compared to other approaches, is that it fully utilizes all the information in the data. The symbolic covariance matrix can be decomposed into a within part CovW and a between part CovB. We propose a further insight into the PCA results: the proportion of variance explained due to the within information and the proportion of variance explained due to the between information can be calculated. Additionally, we suggest PCA on CovB and CovW to be done to obtain deeper insights into the data under study. In the case study presented, the information gain when performing PCA on the intervals instead of the interval midpoints (conditionally the means) is about 45%. It turns out that, for these data, the uniformity assumption over intervals does not hold and so analysis of the data represented by histogram-valued variables is suggested.","PeriodicalId":242585,"journal":{"name":"Advances in Methodology and Statistics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Methodology and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51936/nbpe2127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

In the last two decades, principal component analysis (PCA) was extended to interval-valued data; several adaptations of the classical approach are known from the literature. Our approach is based on the symbolic covariance matrix Cov for the interval-valued variables proposed by Billard (2008). Its crucial advantage, when compared to other approaches, is that it fully utilizes all the information in the data. The symbolic covariance matrix can be decomposed into a within part CovW and a between part CovB. We propose a further insight into the PCA results: the proportion of variance explained due to the within information and the proportion of variance explained due to the between information can be calculated. Additionally, we suggest PCA on CovB and CovW to be done to obtain deeper insights into the data under study. In the case study presented, the information gain when performing PCA on the intervals instead of the interval midpoints (conditionally the means) is about 45%. It turns out that, for these data, the uniformity assumption over intervals does not hold and so analysis of the data represented by histogram-valued variables is suggested.
区间变量的符号协方差矩阵及其在主成分分析中的应用
近二十年来,主成分分析(PCA)已扩展到区间值数据;从文献中可以得知对古典方法的几种改编。我们的方法基于Billard(2008)提出的区间值变量的符号协方差矩阵Cov。与其他方法相比,它的关键优势在于充分利用了数据中的所有信息。符号协方差矩阵可以分解为CovW内部和CovB之间。我们建议进一步深入分析PCA结果:可以计算由内部信息解释的方差比例和由之间信息解释的方差比例。此外,我们建议对CovB和CovW进行PCA,以更深入地了解所研究的数据。在给出的案例研究中,当对区间而不是区间中点(有条件地为均值)执行PCA时,信息增益约为45%。结果表明,对于这些数据,间隔上的均匀性假设不成立,因此建议对直方图值变量表示的数据进行分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信