{"title":"Principal Components Analysis: Centring and Rotations","authors":"Richard G. Brereton","doi":"10.1002/cem.3610","DOIUrl":null,"url":null,"abstract":"<p>It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.</p><p>We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.</p><p>So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [<span>5, 6</span>]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.</p><p>As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3610","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3610","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.
We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.
So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [5, 6]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.
As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.