Study of the Principal Components Method Modifications Resistance to Abnormal Observations

Q3 Mathematics
V. Goryainov, E. Goryainova
{"title":"Study of the Principal Components Method Modifications Resistance to Abnormal Observations","authors":"V. Goryainov, E. Goryainova","doi":"10.18698/1812-3368-2023-2-17-34","DOIUrl":null,"url":null,"abstract":"The paper considers the problem of reducing multidimensional correlated indicators. One of the approaches to solving this problem is based on the method of principal components, which makes it possible to compactly describe the vector with correlated coordinates (components) using the principal components vector with uncorrelated coordinates of much smaller dimension, while retaining most of the information about correlation structure of the original vector. On simulated and real data, several modifications of the principal components method were compared differing in the method of evaluating correlation matrix of the observation vector. The work objective is to demonstrate advantages of the robust modifications of the principal components method in cases, where data contained the abnormal values. To compare the considered modifications on the model data, metric was introduced that measured the difference between estimated and true eigenvalues of the initial data correlation matrix. This metric behavior depending on the probability distribution of observations was studied by computer simulation. As the distributions, multivariate distributions with the off-diagonal correlation matrices simulating a polluted sample were selected. Next, a sample of 13 correlated socioeconomic indicators for 85 countries was considered, where 46 abnormal values were identified. The considered modifications of the principal components method chose the same optimal number of principal components equal to three. However, the real data compression quality, which was defined as the share of the initial indicators total variance described by the first three principal components, turned out to be significantly higher for the robust modifications of the principal components method. Results obtained on these real data are in good agreement with conclusions of the computer simulation","PeriodicalId":12961,"journal":{"name":"Herald of the Bauman Moscow State Technical University. Series Natural Sciences","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Herald of the Bauman Moscow State Technical University. Series Natural Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18698/1812-3368-2023-2-17-34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

The paper considers the problem of reducing multidimensional correlated indicators. One of the approaches to solving this problem is based on the method of principal components, which makes it possible to compactly describe the vector with correlated coordinates (components) using the principal components vector with uncorrelated coordinates of much smaller dimension, while retaining most of the information about correlation structure of the original vector. On simulated and real data, several modifications of the principal components method were compared differing in the method of evaluating correlation matrix of the observation vector. The work objective is to demonstrate advantages of the robust modifications of the principal components method in cases, where data contained the abnormal values. To compare the considered modifications on the model data, metric was introduced that measured the difference between estimated and true eigenvalues of the initial data correlation matrix. This metric behavior depending on the probability distribution of observations was studied by computer simulation. As the distributions, multivariate distributions with the off-diagonal correlation matrices simulating a polluted sample were selected. Next, a sample of 13 correlated socioeconomic indicators for 85 countries was considered, where 46 abnormal values were identified. The considered modifications of the principal components method chose the same optimal number of principal components equal to three. However, the real data compression quality, which was defined as the share of the initial indicators total variance described by the first three principal components, turned out to be significantly higher for the robust modifications of the principal components method. Results obtained on these real data are in good agreement with conclusions of the computer simulation
主成分法修正抗异常观测的研究
本文研究了多维相关指标的约简问题。解决这一问题的方法之一是基于主成分的方法,该方法可以在保留原向量的大部分相关结构信息的情况下,使用具有更小维度的不相关坐标的主成分向量来紧凑地描述具有相关坐标(分量)的向量。在模拟数据和实际数据上,比较了主成分法的几种修正方法在观测向量相关矩阵评价方法上的差异。工作目标是证明在数据包含异常值的情况下,主成分方法的鲁棒修正的优势。为了比较对模型数据所考虑的修改,引入了度量初始数据相关矩阵的估计特征值与真实特征值之差的度量。通过计算机模拟研究了这种依赖于观测值概率分布的度量行为。选取具有非对角相关矩阵的多元分布作为模拟污染样本的分布。接下来,考虑了85个国家的13个相关社会经济指标样本,其中确定了46个异常值。所考虑的主成分法修正选择了相同的最优主成分数为3。然而,实际数据压缩质量,即定义为前三个主成分描述的初始指标总方差的份额,在主成分方法的鲁棒性修改下显着更高。在这些实际数据上得到的结果与计算机模拟的结论符合得很好
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.10
自引率
0.00%
发文量
40
期刊介绍: The journal is aimed at publishing most significant results of fundamental and applied studies and developments performed at research and industrial institutions in the following trends (ASJC code): 2600 Mathematics 2200 Engineering 3100 Physics and Astronomy 1600 Chemistry 1700 Computer Science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信