Kasper Krijnen, Paul Blenkinsopp, Ron M A Heeren, Ian G M Anthony
{"title":"处理下一代质谱成像数据:规模主成分分析","authors":"Kasper Krijnen, Paul Blenkinsopp, Ron M A Heeren, Ian G M Anthony","doi":"10.1021/jasms.4c00314","DOIUrl":null,"url":null,"abstract":"<p><p>Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.</p>","PeriodicalId":672,"journal":{"name":"Journal of the American Society for Mass Spectrometry","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale.\",\"authors\":\"Kasper Krijnen, Paul Blenkinsopp, Ron M A Heeren, Ian G M Anthony\",\"doi\":\"10.1021/jasms.4c00314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.</p>\",\"PeriodicalId\":672,\"journal\":{\"name\":\"Journal of the American Society for Mass Spectrometry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Society for Mass Spectrometry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/jasms.4c00314\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Mass Spectrometry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jasms.4c00314","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale.
Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.
期刊介绍:
The Journal of the American Society for Mass Spectrometry presents research papers covering all aspects of mass spectrometry, incorporating coverage of fields of scientific inquiry in which mass spectrometry can play a role.
Comprehensive in scope, the journal publishes papers on both fundamentals and applications of mass spectrometry. Fundamental subjects include instrumentation principles, design, and demonstration, structures and chemical properties of gas-phase ions, studies of thermodynamic properties, ion spectroscopy, chemical kinetics, mechanisms of ionization, theories of ion fragmentation, cluster ions, and potential energy surfaces. In addition to full papers, the journal offers Communications, Application Notes, and Accounts and Perspectives