基于核系数的标定模型更新重要样本选择

IF 3.8 2区化学 Q2 AUTOMATION & CONTROL SYSTEMS

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-06-19 DOI:10.1016/j.chemolab.2025.105472

Zhongjiang He , Zhonghai He , Xiaofang Zhang

{"title":"基于核系数的标定模型更新重要样本选择","authors":"Zhongjiang He , Zhonghai He , Xiaofang Zhang","doi":"10.1016/j.chemolab.2025.105472","DOIUrl":null,"url":null,"abstract":"<div><div>The predictive performance of PLS models often diminishes for new samples outside the application domain and requires updating. The conventional approach involves augmenting the calibration set with new samples to accommodate changes. However, the efficiency of updating is hampered by the large number of samples in the old calibration set. To expedite the updating process, it is crucial to delete some samples, maintaining a balanced class ratio between old and new samples. Surprisingly, there has been a lack of discussion on how to select old samples thus far. This paper introduces a method known as Kernel Coefficient Selection (KCS), designed to obtain coefficients for each sample in the calibration model and subsequently identify crucial old samples. A large coefficient suggests the sample's significance and the need for retention. The rationale behind this method is grounded in the theory of the duality of two types of models (similarity-based and non-similarity-based). Removing samples with small coefficients from the calibration set still preserves the integrity of the major regression model. Comparative experiments were conducted between models that retained partial important samples and those involving all samples, using both simulated and real soybean meal datasets. The experimental results demonstrate that the KCS method mitigates the issue of number imbalance between old and new samples, thereby enhancing the efficiency of model updating.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"264 ","pages":"Article 105472"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Selection of important samples for calibration model updating based on kernel coefficients\",\"authors\":\"Zhongjiang He , Zhonghai He , Xiaofang Zhang\",\"doi\":\"10.1016/j.chemolab.2025.105472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The predictive performance of PLS models often diminishes for new samples outside the application domain and requires updating. The conventional approach involves augmenting the calibration set with new samples to accommodate changes. However, the efficiency of updating is hampered by the large number of samples in the old calibration set. To expedite the updating process, it is crucial to delete some samples, maintaining a balanced class ratio between old and new samples. Surprisingly, there has been a lack of discussion on how to select old samples thus far. This paper introduces a method known as Kernel Coefficient Selection (KCS), designed to obtain coefficients for each sample in the calibration model and subsequently identify crucial old samples. A large coefficient suggests the sample's significance and the need for retention. The rationale behind this method is grounded in the theory of the duality of two types of models (similarity-based and non-similarity-based). Removing samples with small coefficients from the calibration set still preserves the integrity of the major regression model. Comparative experiments were conducted between models that retained partial important samples and those involving all samples, using both simulated and real soybean meal datasets. The experimental results demonstrate that the KCS method mitigates the issue of number imbalance between old and new samples, thereby enhancing the efficiency of model updating.</div></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"264 \",\"pages\":\"Article 105472\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743925001571\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001571","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

PLS模型的预测性能对于应用程序域之外的新样本通常会降低，并且需要更新。传统的方法包括用新样本增加校准集以适应变化。然而，旧的校准集中样本数量过多，影响了更新的效率。为了加快更新过程，删除一些样本，在新旧样本之间保持平衡的类比例是至关重要的。令人惊讶的是，到目前为止还没有关于如何选择旧样本的讨论。本文介绍了一种称为核系数选择（KCS）的方法，该方法旨在获取校准模型中每个样本的系数，并随后识别关键的旧样本。系数大表明样品的重要性和保留的必要性。这种方法背后的基本原理是基于两种模型（基于相似性和非基于相似性）的二元性理论。从校准集中去除系数小的样本仍然保留了主要回归模型的完整性。使用模拟和真实的豆粕数据集，对保留部分重要样本的模型和包含所有样本的模型进行了对比实验。实验结果表明，KCS方法缓解了新旧样本数量不平衡的问题，提高了模型更新的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Selection of important samples for calibration model updating based on kernel coefficients

The predictive performance of PLS models often diminishes for new samples outside the application domain and requires updating. The conventional approach involves augmenting the calibration set with new samples to accommodate changes. However, the efficiency of updating is hampered by the large number of samples in the old calibration set. To expedite the updating process, it is crucial to delete some samples, maintaining a balanced class ratio between old and new samples. Surprisingly, there has been a lack of discussion on how to select old samples thus far. This paper introduces a method known as Kernel Coefficient Selection (KCS), designed to obtain coefficients for each sample in the calibration model and subsequently identify crucial old samples. A large coefficient suggests the sample's significance and the need for retention. The rationale behind this method is grounded in the theory of the duality of two types of models (similarity-based and non-similarity-based). Removing samples with small coefficients from the calibration set still preserves the integrity of the major regression model. Comparative experiments were conducted between models that retained partial important samples and those involving all samples, using both simulated and real soybean meal datasets. The experimental results demonstrate that the KCS method mitigates the issue of number imbalance between old and new samples, thereby enhancing the efficiency of model updating.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chemometrics and Intelligent Laboratory Systems 工程技术-分析化学

CiteScore

7.50

自引率

7.70%

发文量

169

审稿时长

3.4 months

期刊介绍： Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.