用于频谱分析的基本主成分数和近乎无训练模型

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-02 DOI:10.1109/TPAMI.2024.3436860

Yifeng Bie;Shuai You;Xinrui Li;Xuekui Zhang;Tao Lu

{"title":"用于频谱分析的基本主成分数和近乎无训练模型","authors":"Yifeng Bie;Shuai You;Xinrui Li;Xuekui Zhang;Tao Lu","doi":"10.1109/TPAMI.2024.3436860","DOIUrl":null,"url":null,"abstract":"Learning-enabled spectroscopic analysis, promising for automated real-time analysis of chemicals, is facing several challenges. First, a typical machine learning model requires a large number of training samples that physical systems can not provide. Second, it requires the testing samples to be in range with the training samples, which often is not the case in the real world. Further, a spectroscopy device is limited by its memory size, computing power, and battery capacity. That requires highly efficient learning models for on-site analysis. In this paper, by analyzing multi-gas mixtures and multi-molecule suspensions, we first show that orders of magnitude reduction of data dimension can be achieved as the number of principal components that need to be retained is the same as the independent constituents in the mixture. From this principle, we designed highly compact models in which the essential principal components can be directly extracted from the interrelations between the individual chemical properties and principal components; and only a few training samples are required. Our model can predict the constituent concentrations that have not been seen in the training dataset and provide estimations of measurement noises. This approach can be extended as an effectively standardized method for principle component extraction.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"9714-9726"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10620616","citationCount":"0","resultStr":"{\"title\":\"Essential Number of Principal Components and Nearly Training-Free Model for Spectral Analysis\",\"authors\":\"Yifeng Bie;Shuai You;Xinrui Li;Xuekui Zhang;Tao Lu\",\"doi\":\"10.1109/TPAMI.2024.3436860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning-enabled spectroscopic analysis, promising for automated real-time analysis of chemicals, is facing several challenges. First, a typical machine learning model requires a large number of training samples that physical systems can not provide. Second, it requires the testing samples to be in range with the training samples, which often is not the case in the real world. Further, a spectroscopy device is limited by its memory size, computing power, and battery capacity. That requires highly efficient learning models for on-site analysis. In this paper, by analyzing multi-gas mixtures and multi-molecule suspensions, we first show that orders of magnitude reduction of data dimension can be achieved as the number of principal components that need to be retained is the same as the independent constituents in the mixture. From this principle, we designed highly compact models in which the essential principal components can be directly extracted from the interrelations between the individual chemical properties and principal components; and only a few training samples are required. Our model can predict the constituent concentrations that have not been seen in the training dataset and provide estimations of measurement noises. This approach can be extended as an effectively standardized method for principle component extraction.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"46 12\",\"pages\":\"9714-9726\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10620616\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10620616/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10620616/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

具有学习功能的光谱分析有望实现化学物质的自动实时分析，但目前面临着一些挑战。首先，典型的机器学习模型需要大量的训练样本，而物理系统无法提供。其次，它要求测试样本与训练样本在一定范围内，而现实世界中往往不存在这种情况。此外，光谱设备还受到内存大小、计算能力和电池容量的限制。这就需要高效的学习模型来进行现场分析。在本文中，通过分析多气体混合物和多分子悬浮液，我们首先表明，由于需要保留的主成分数量与混合物中的独立成分数量相同，因此可以实现数据维度的数量级缩减。根据这一原理，我们设计了高度紧凑的模型，可以直接从单个化学特性和主成分之间的相互关系中提取基本主成分，而且只需要少量训练样本。我们的模型可以预测训练数据集中未出现的成分浓度，并提供测量噪声估计。这种方法可以扩展为一种有效的标准化原理成分提取方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Essential Number of Principal Components and Nearly Training-Free Model for Spectral Analysis

Learning-enabled spectroscopic analysis, promising for automated real-time analysis of chemicals, is facing several challenges. First, a typical machine learning model requires a large number of training samples that physical systems can not provide. Second, it requires the testing samples to be in range with the training samples, which often is not the case in the real world. Further, a spectroscopy device is limited by its memory size, computing power, and battery capacity. That requires highly efficient learning models for on-site analysis. In this paper, by analyzing multi-gas mixtures and multi-molecule suspensions, we first show that orders of magnitude reduction of data dimension can be achieved as the number of principal components that need to be retained is the same as the independent constituents in the mixture. From this principle, we designed highly compact models in which the essential principal components can be directly extracted from the interrelations between the individual chemical properties and principal components; and only a few training samples are required. Our model can predict the constituent concentrations that have not been seen in the training dataset and provide estimations of measurement noises. This approach can be extended as an effectively standardized method for principle component extraction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量