处理下一代质谱成像数据:规模主成分分析

IF 3.1 2区 化学 Q2 BIOCHEMICAL RESEARCH METHODS
Kasper Krijnen, Paul Blenkinsopp, Ron M A Heeren, Ian G M Anthony
{"title":"处理下一代质谱成像数据:规模主成分分析","authors":"Kasper Krijnen, Paul Blenkinsopp, Ron M A Heeren, Ian G M Anthony","doi":"10.1021/jasms.4c00314","DOIUrl":null,"url":null,"abstract":"<p><p>Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.</p>","PeriodicalId":672,"journal":{"name":"Journal of the American Society for Mass Spectrometry","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale.\",\"authors\":\"Kasper Krijnen, Paul Blenkinsopp, Ron M A Heeren, Ian G M Anthony\",\"doi\":\"10.1021/jasms.4c00314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.</p>\",\"PeriodicalId\":672,\"journal\":{\"name\":\"Journal of the American Society for Mass Spectrometry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Society for Mass Spectrometry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/jasms.4c00314\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Mass Spectrometry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jasms.4c00314","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

质谱成像技术(MSI)在空间分辨能力、吞吐量和质量分辨率方面不断改进。这些改进虽然有益,但却增加了数据集的大小和内容。更大的数据需要相应快速的计算机分析。然而,这些分析往往不能很好地随着数据量的增加而扩展。主成分分析(PCA)是 MSI 数据常用的重要分析工具;不过,大多数 PCA 算法都是在随机存取内存(RAM)中加载和处理整个数据集,而随机存取内存通常不足以处理大型数据集。也有使用比数据集更少的 RAM 的 PCA 算法,但通常速度更慢或牺牲精度,很少用于 MSI 数据处理。增量 PCA(IPCA)是一种替代算法,可避免大量 RAM 分配,同时还能保持速度和分析精度。在这里,我们演示了 IPCA、PCA 和商业软件的不同实现方法在大型且通常复杂的 MSI 数据集上的使用情况,并对其进行了基准测试。我们表明,使用已发布的基于 Python 的 IPCA 算法,IPCA 可以成功地应用于 MSI 数据集,因为数据集太大,RAM 无法容纳。此外,我们的基准测试表明,与预期相反,在可以直接比较的所有大型数据集上,IPCA 比所有其他测试过的 PCA 实现都要快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale.

Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.50
自引率
9.40%
发文量
257
审稿时长
1 months
期刊介绍: The Journal of the American Society for Mass Spectrometry presents research papers covering all aspects of mass spectrometry, incorporating coverage of fields of scientific inquiry in which mass spectrometry can play a role. Comprehensive in scope, the journal publishes papers on both fundamentals and applications of mass spectrometry. Fundamental subjects include instrumentation principles, design, and demonstration, structures and chemical properties of gas-phase ions, studies of thermodynamic properties, ion spectroscopy, chemical kinetics, mechanisms of ionization, theories of ion fragmentation, cluster ions, and potential energy surfaces. In addition to full papers, the journal offers Communications, Application Notes, and Accounts and Perspectives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信