{"title":"保存多元思维的艺术:PCA和经验的智慧","authors":"Margarita Hartlieb","doi":"10.1002/agj2.70151","DOIUrl":null,"url":null,"abstract":"<p>The grand tradition of science as practiced by figures like Darwin (<span>1859</span>) or Alexander von Humboldt (<span>1871</span>), where all observations, thoughts, and conclusions were meticulously recorded, has largely faded. Today's scientific landscape is becoming increasingly specialized, with fewer opportunities to explore knowledge in a holistic, interconnected way. Adding to this challenge, many long-standing academic positions once held by senior professors are often being remodeled or phased out. Even when successors are appointed, much of the predecessor's tacit knowledge often remains undocumented, putting a vast intellectual legacy at risk of vanishing. This makes it all more important that experienced scholars take the time to document and share their accumulated knowledge.</p><p>Editorials like “Notes on the Use of Principal Component Analysis in Agronomic Research” by Matthew (<span>2025</span>) offer a valuable remedy, creating space to pass on insights gained over a lifetime that cannot easily be captured in typical research papers. Even if certain methodologies may no longer be at the cutting edge, the context, experience, and wisdom embedded in such reflections can guide new generations of scientists.</p><p>In this editorial, the author reflects on four decades of experience using principal component analysis (PCA) in agronomic research, highlighting its strengths as a tool for data exploration and pattern detection, particularly in multivariate datasets where traditional univariate approaches may overlook significant interactions among traits.</p><p>Given the wealth of practical insights, he begins the editorial by drawing on his own teaching experience, including an applied student-custom biometric dataset to illustrate how PCA captures both correlated and independent traits. Further, the author challenges the widely held belief that data must always be standardized before conducting PCA, as he shows that even PCA with the correlation matrix on unstandardized data yields the same loading coefficients as standardized data.</p><p>The editorial also introduces an innovative approach developed by the author himself after years of use of PCAs. He thereby uses cross-correlation of the original principal component (PC) scores to assess how the inclusion of new variables or exclusion of a variable redistributes information across components. Where a set of PC scores is cross-correlated with themselves, this is depicted as diagonal line of correlation values of 1.0 in a background of zero correlations. In the example shown, it is seen that the addition of a new variable to a five-variable PCA resulted in the original PC3 being split into two PCs at PC3 and PC4, with original PCs 4 and 5 being demoted to PCs 5 and 6.</p><p>In the next section, the author questions the common rule of discarding PCs with eigenvalues below 1, demonstrating through concrete examples that such components can still harbor biologically relevant signals. This oversight is frequently rooted in the tendency to overlook or underreport nonsignificant results, despite their potential to carry valuable scientific insight and contextual relevance (Dushoff et al., <span>2019</span>). Accordingly, the author suggests selecting principal components for biplots based on their biological relevance rather than solely on the amount of variance they explain.</p><p>In addition, he highlights a misconception in PCA application of adding linear functions of existing variables as new traits. While such additions may slightly alter loading coefficients, they contribute no new information to the data matrix and simply result in empty components.</p><p>Likewise, the editorial raises concerns about the use of varimax rotation, noting that while it may make principal components appear easier to interpret by amplifying larger loadings and suppressing smaller ones, it comes at the cost of redistributing information across components, which may disrupt meaningful pattern detection and obscure biologically relevant signals. In his analysis, rotated PCs exhibit weaker correlations with their original counterparts and unexpected overlaps with other components, ultimately complicating rather than enhancing biological interpretation, as seen in Zhou et al. (<span>2023</span>), where previously observed experimental treatment effects detected in analysis of variance (ANOVA) of PC scores disappeared following varimax rotation.</p><p>In the end, the author offers additional application insights, including an uncommon advice for using PCA explicitly for dimensionality reduction, not just trait association, which can effectively overcome the limitations of traditional repeat measured ANOVA when dealing with multilayered data. However, he goes even a step further by highlighting the strength of PCAs not only in dimensionality reduction but also in its capacity for dimensionality retention. Particularly in agronomy, where datasets may contain multiple overlapping trait associations, PCA allows researchers to preserve and distinguish these independent signals.</p><p>In sum, this editorial is not just an insightful guide for agronomists seeking to apply PCA to complex datasets to enhance their multivariate analyses, it reminds us that while statistical tools evolve, the deep understanding of their application in context is something only experience can teach. By sharing his insights, Cory Matthew provides both a theoretical background and practical guidance, ensuring that future scientists can stand on solid ground, not only in technique, but in judgment.</p><p><b>Margarita Hartlieb</b>: Conceptualization; writing—original draft.</p><p>The author declares no conflicts of interest.</p>","PeriodicalId":7522,"journal":{"name":"Agronomy Journal","volume":"117 5","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://acsess.onlinelibrary.wiley.com/doi/epdf/10.1002/agj2.70151","citationCount":"0","resultStr":"{\"title\":\"Preserving the art of multivariate thinking: PCA and the wisdom of experience\",\"authors\":\"Margarita Hartlieb\",\"doi\":\"10.1002/agj2.70151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The grand tradition of science as practiced by figures like Darwin (<span>1859</span>) or Alexander von Humboldt (<span>1871</span>), where all observations, thoughts, and conclusions were meticulously recorded, has largely faded. Today's scientific landscape is becoming increasingly specialized, with fewer opportunities to explore knowledge in a holistic, interconnected way. Adding to this challenge, many long-standing academic positions once held by senior professors are often being remodeled or phased out. Even when successors are appointed, much of the predecessor's tacit knowledge often remains undocumented, putting a vast intellectual legacy at risk of vanishing. This makes it all more important that experienced scholars take the time to document and share their accumulated knowledge.</p><p>Editorials like “Notes on the Use of Principal Component Analysis in Agronomic Research” by Matthew (<span>2025</span>) offer a valuable remedy, creating space to pass on insights gained over a lifetime that cannot easily be captured in typical research papers. Even if certain methodologies may no longer be at the cutting edge, the context, experience, and wisdom embedded in such reflections can guide new generations of scientists.</p><p>In this editorial, the author reflects on four decades of experience using principal component analysis (PCA) in agronomic research, highlighting its strengths as a tool for data exploration and pattern detection, particularly in multivariate datasets where traditional univariate approaches may overlook significant interactions among traits.</p><p>Given the wealth of practical insights, he begins the editorial by drawing on his own teaching experience, including an applied student-custom biometric dataset to illustrate how PCA captures both correlated and independent traits. Further, the author challenges the widely held belief that data must always be standardized before conducting PCA, as he shows that even PCA with the correlation matrix on unstandardized data yields the same loading coefficients as standardized data.</p><p>The editorial also introduces an innovative approach developed by the author himself after years of use of PCAs. He thereby uses cross-correlation of the original principal component (PC) scores to assess how the inclusion of new variables or exclusion of a variable redistributes information across components. Where a set of PC scores is cross-correlated with themselves, this is depicted as diagonal line of correlation values of 1.0 in a background of zero correlations. In the example shown, it is seen that the addition of a new variable to a five-variable PCA resulted in the original PC3 being split into two PCs at PC3 and PC4, with original PCs 4 and 5 being demoted to PCs 5 and 6.</p><p>In the next section, the author questions the common rule of discarding PCs with eigenvalues below 1, demonstrating through concrete examples that such components can still harbor biologically relevant signals. This oversight is frequently rooted in the tendency to overlook or underreport nonsignificant results, despite their potential to carry valuable scientific insight and contextual relevance (Dushoff et al., <span>2019</span>). Accordingly, the author suggests selecting principal components for biplots based on their biological relevance rather than solely on the amount of variance they explain.</p><p>In addition, he highlights a misconception in PCA application of adding linear functions of existing variables as new traits. While such additions may slightly alter loading coefficients, they contribute no new information to the data matrix and simply result in empty components.</p><p>Likewise, the editorial raises concerns about the use of varimax rotation, noting that while it may make principal components appear easier to interpret by amplifying larger loadings and suppressing smaller ones, it comes at the cost of redistributing information across components, which may disrupt meaningful pattern detection and obscure biologically relevant signals. In his analysis, rotated PCs exhibit weaker correlations with their original counterparts and unexpected overlaps with other components, ultimately complicating rather than enhancing biological interpretation, as seen in Zhou et al. (<span>2023</span>), where previously observed experimental treatment effects detected in analysis of variance (ANOVA) of PC scores disappeared following varimax rotation.</p><p>In the end, the author offers additional application insights, including an uncommon advice for using PCA explicitly for dimensionality reduction, not just trait association, which can effectively overcome the limitations of traditional repeat measured ANOVA when dealing with multilayered data. However, he goes even a step further by highlighting the strength of PCAs not only in dimensionality reduction but also in its capacity for dimensionality retention. Particularly in agronomy, where datasets may contain multiple overlapping trait associations, PCA allows researchers to preserve and distinguish these independent signals.</p><p>In sum, this editorial is not just an insightful guide for agronomists seeking to apply PCA to complex datasets to enhance their multivariate analyses, it reminds us that while statistical tools evolve, the deep understanding of their application in context is something only experience can teach. By sharing his insights, Cory Matthew provides both a theoretical background and practical guidance, ensuring that future scientists can stand on solid ground, not only in technique, but in judgment.</p><p><b>Margarita Hartlieb</b>: Conceptualization; writing—original draft.</p><p>The author declares no conflicts of interest.</p>\",\"PeriodicalId\":7522,\"journal\":{\"name\":\"Agronomy Journal\",\"volume\":\"117 5\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://acsess.onlinelibrary.wiley.com/doi/epdf/10.1002/agj2.70151\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Agronomy Journal\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.70151\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AGRONOMY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agronomy Journal","FirstCategoryId":"97","ListUrlMain":"https://acsess.onlinelibrary.wiley.com/doi/10.1002/agj2.70151","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
摘要
像达尔文(1859)或亚历山大·冯·洪堡(1871)这样的人物所实践的科学的伟大传统,在那里所有的观察、思想和结论都被精心记录下来,这在很大程度上已经消失了。今天的科学领域正变得越来越专业化,以整体的、相互联系的方式探索知识的机会越来越少。更大的挑战是,许多曾经由资深教授担任的长期学术职位经常被重组或淘汰。即使任命了继任者,前任的许多隐性知识也往往没有被记录下来,这使得大量的知识遗产面临消失的风险。这使得有经验的学者花时间记录和分享他们积累的知识变得更加重要。马修(Matthew)(2025)的社论《农艺研究中主成分分析的使用注释》(Notes on Use of Principal Component Analysis in Agronomic Research)提供了一个有价值的补救措施,为传递在典型研究论文中不容易捕捉到的一生中获得的见解创造了空间。即使某些方法可能不再处于前沿,但这些反思所蕴含的背景、经验和智慧可以指导新一代的科学家。在这篇社论中,作者回顾了四十年来在农艺研究中使用主成分分析(PCA)的经验,强调了其作为数据探索和模式检测工具的优势,特别是在传统的单变量方法可能忽略性状之间重要相互作用的多变量数据集中。鉴于丰富的实践见解,他以自己的教学经验开始社论,包括应用学生定制的生物特征数据集,以说明PCA如何捕获相关和独立的特征。此外,作者挑战了广泛持有的观点,即数据必须在进行PCA之前进行标准化,因为他表明,即使在非标准化数据上使用相关矩阵的PCA也会产生与标准化数据相同的加载系数。社论还介绍了作者自己在多年使用pca后开发的一种创新方法。因此,他使用原始主成分(PC)分数的相互关系来评估包含新变量或排除变量如何在组件之间重新分配信息。当一组PC分数与自身交叉相关时,这被描述为在零相关背景下相关值为1.0的对角线。在所示的示例中,可以看到在五变量PCA中添加一个新变量导致原始PC3在PC3和PC4上被分成两个pc,原始PC4和5被降级为pc5和6。在下一节中,作者质疑丢弃特征值低于1的pc的普遍规则,并通过具体示例证明这些组件仍然可以包含生物学相关信号。这种疏忽往往源于忽视或低估不重要结果的倾向,尽管它们有可能带来有价值的科学见解和上下文相关性(Dushoff et al., 2019)。因此,作者建议根据它们的生物学相关性来选择双标图的主成分,而不是仅仅根据它们解释的方差量。此外,他强调了PCA应用中的一个误解,即添加现有变量的线性函数作为新特征。虽然这样的添加可能会稍微改变加载系数,但它们不会为数据矩阵提供新的信息,只会导致空组件。同样,这篇社论提出了对使用变大旋转的担忧,指出虽然它可以通过放大较大的负载和抑制较小的负载来使主成分看起来更容易解释,但它的代价是在组件之间重新分配信息,这可能会破坏有意义的模式检测并模糊生物学相关信号。在他的分析中,旋转后的PC与其原始对应体表现出较弱的相关性,并且与其他组件意外重叠,最终使生物学解释复杂化而不是增强,如Zhou等人(2023)所见,其中先前观察到的在PC分数方差分析(ANOVA)中检测到的实验治疗效果在变大旋转后消失。最后,作者提供了额外的应用见解,包括明确使用PCA进行降维的不寻常建议,而不仅仅是特征关联,这可以有效地克服传统重复测量方差分析在处理多层数据时的局限性。然而,他进一步强调了pca的优势,不仅在降维方面,而且在维数保持能力方面。特别是在农学中,数据集可能包含多个重叠的性状关联,PCA允许研究人员保存和区分这些独立的信号。 总而言之,这篇社论不仅是农学家寻求将PCA应用于复杂数据集以增强其多变量分析的有见地的指南,它提醒我们,虽然统计工具不断发展,但对其在上下文中的应用的深刻理解是只有经验才能教会的。通过分享他的见解,科里·马修提供了理论背景和实践指导,确保未来的科学家不仅在技术上,而且在判断上都能站在坚实的基础上。玛格丽特·哈特利布:概念化;原创作品。作者声明无利益冲突。
Preserving the art of multivariate thinking: PCA and the wisdom of experience
The grand tradition of science as practiced by figures like Darwin (1859) or Alexander von Humboldt (1871), where all observations, thoughts, and conclusions were meticulously recorded, has largely faded. Today's scientific landscape is becoming increasingly specialized, with fewer opportunities to explore knowledge in a holistic, interconnected way. Adding to this challenge, many long-standing academic positions once held by senior professors are often being remodeled or phased out. Even when successors are appointed, much of the predecessor's tacit knowledge often remains undocumented, putting a vast intellectual legacy at risk of vanishing. This makes it all more important that experienced scholars take the time to document and share their accumulated knowledge.
Editorials like “Notes on the Use of Principal Component Analysis in Agronomic Research” by Matthew (2025) offer a valuable remedy, creating space to pass on insights gained over a lifetime that cannot easily be captured in typical research papers. Even if certain methodologies may no longer be at the cutting edge, the context, experience, and wisdom embedded in such reflections can guide new generations of scientists.
In this editorial, the author reflects on four decades of experience using principal component analysis (PCA) in agronomic research, highlighting its strengths as a tool for data exploration and pattern detection, particularly in multivariate datasets where traditional univariate approaches may overlook significant interactions among traits.
Given the wealth of practical insights, he begins the editorial by drawing on his own teaching experience, including an applied student-custom biometric dataset to illustrate how PCA captures both correlated and independent traits. Further, the author challenges the widely held belief that data must always be standardized before conducting PCA, as he shows that even PCA with the correlation matrix on unstandardized data yields the same loading coefficients as standardized data.
The editorial also introduces an innovative approach developed by the author himself after years of use of PCAs. He thereby uses cross-correlation of the original principal component (PC) scores to assess how the inclusion of new variables or exclusion of a variable redistributes information across components. Where a set of PC scores is cross-correlated with themselves, this is depicted as diagonal line of correlation values of 1.0 in a background of zero correlations. In the example shown, it is seen that the addition of a new variable to a five-variable PCA resulted in the original PC3 being split into two PCs at PC3 and PC4, with original PCs 4 and 5 being demoted to PCs 5 and 6.
In the next section, the author questions the common rule of discarding PCs with eigenvalues below 1, demonstrating through concrete examples that such components can still harbor biologically relevant signals. This oversight is frequently rooted in the tendency to overlook or underreport nonsignificant results, despite their potential to carry valuable scientific insight and contextual relevance (Dushoff et al., 2019). Accordingly, the author suggests selecting principal components for biplots based on their biological relevance rather than solely on the amount of variance they explain.
In addition, he highlights a misconception in PCA application of adding linear functions of existing variables as new traits. While such additions may slightly alter loading coefficients, they contribute no new information to the data matrix and simply result in empty components.
Likewise, the editorial raises concerns about the use of varimax rotation, noting that while it may make principal components appear easier to interpret by amplifying larger loadings and suppressing smaller ones, it comes at the cost of redistributing information across components, which may disrupt meaningful pattern detection and obscure biologically relevant signals. In his analysis, rotated PCs exhibit weaker correlations with their original counterparts and unexpected overlaps with other components, ultimately complicating rather than enhancing biological interpretation, as seen in Zhou et al. (2023), where previously observed experimental treatment effects detected in analysis of variance (ANOVA) of PC scores disappeared following varimax rotation.
In the end, the author offers additional application insights, including an uncommon advice for using PCA explicitly for dimensionality reduction, not just trait association, which can effectively overcome the limitations of traditional repeat measured ANOVA when dealing with multilayered data. However, he goes even a step further by highlighting the strength of PCAs not only in dimensionality reduction but also in its capacity for dimensionality retention. Particularly in agronomy, where datasets may contain multiple overlapping trait associations, PCA allows researchers to preserve and distinguish these independent signals.
In sum, this editorial is not just an insightful guide for agronomists seeking to apply PCA to complex datasets to enhance their multivariate analyses, it reminds us that while statistical tools evolve, the deep understanding of their application in context is something only experience can teach. By sharing his insights, Cory Matthew provides both a theoretical background and practical guidance, ensuring that future scientists can stand on solid ground, not only in technique, but in judgment.
期刊介绍:
After critical review and approval by the editorial board, AJ publishes articles reporting research findings in soil–plant relationships; crop science; soil science; biometry; crop, soil, pasture, and range management; crop, forage, and pasture production and utilization; turfgrass; agroclimatology; agronomic models; integrated pest management; integrated agricultural systems; and various aspects of entomology, weed science, animal science, plant pathology, and agricultural economics as applied to production agriculture.
Notes are published about apparatus, observations, and experimental techniques. Observations usually are limited to studies and reports of unrepeatable phenomena or other unique circumstances. Review and interpretation papers are also published, subject to standard review. Contributions to the Forum section deal with current agronomic issues and questions in brief, thought-provoking form. Such papers are reviewed by the editor in consultation with the editorial board.