Principal Components Analysis: Centring and Rotations

IF 2.3 4区 化学 Q1 SOCIAL WORK
Richard G. Brereton
{"title":"Principal Components Analysis: Centring and Rotations","authors":"Richard G. Brereton","doi":"10.1002/cem.3610","DOIUrl":null,"url":null,"abstract":"<p>It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.</p><p>We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.</p><p>So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [<span>5, 6</span>]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.</p><p>As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3610","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3610","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.

We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.

So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [5, 6]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.

As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.

Abstract Image

主成分分析:定心和旋转
很少对原始数据执行PCA,通常在PCA之前进行一些转换或预处理。我们将使用表1所示的4个编号为1到4的6 × 2数据集来说明居中原则,这些数据集由对象A到F组成,并将主要使用图形方法。由于许多读者将选择在PCA之前集中数据,因此有一个很好的理解是有用的。读者应该能够使用自己喜欢的软件包重现相应的数值结果,但有些图形可能在大多数软件中不可用。到目前为止,本文主要是关于几何的,但是实际结果是什么呢?第一个也是最明显的原因是,集中考试通常会改变分数的模式。当存在两个以上的组件时,旋转可能是非常复杂的代数,但我们在这篇介绍性文章中没有足够的空间来详细讨论这个问题,但例如,3D旋转通常表示为围绕每个轴的三个独立的2D旋转[5,6]。符号的变化可以看作是超平面上的反射。PC轴的方向可以根据数据是否为中心而有很大的变化,并且通常比这些简单的二维例子显示出更大的偏差。因此,当变量的数量很大时,集中并不容易可视化和理解。中心是否合适取决于所提出的问题。例如,在混合物的光谱学中,我们可能主要对高于基线的单个化学成分的性质感兴趣,因此定心可能不合适。在许多领域,我们对平均值周围的变异性感兴趣,因此最好将数据集中。本文讨论了专栏居中对PC成绩的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信