{"title":"基于核系数的标定模型更新重要样本选择","authors":"Zhongjiang He , Zhonghai He , Xiaofang Zhang","doi":"10.1016/j.chemolab.2025.105472","DOIUrl":null,"url":null,"abstract":"<div><div>The predictive performance of PLS models often diminishes for new samples outside the application domain and requires updating. The conventional approach involves augmenting the calibration set with new samples to accommodate changes. However, the efficiency of updating is hampered by the large number of samples in the old calibration set. To expedite the updating process, it is crucial to delete some samples, maintaining a balanced class ratio between old and new samples. Surprisingly, there has been a lack of discussion on how to select old samples thus far. This paper introduces a method known as Kernel Coefficient Selection (KCS), designed to obtain coefficients for each sample in the calibration model and subsequently identify crucial old samples. A large coefficient suggests the sample's significance and the need for retention. The rationale behind this method is grounded in the theory of the duality of two types of models (similarity-based and non-similarity-based). Removing samples with small coefficients from the calibration set still preserves the integrity of the major regression model. Comparative experiments were conducted between models that retained partial important samples and those involving all samples, using both simulated and real soybean meal datasets. The experimental results demonstrate that the KCS method mitigates the issue of number imbalance between old and new samples, thereby enhancing the efficiency of model updating.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"264 ","pages":"Article 105472"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Selection of important samples for calibration model updating based on kernel coefficients\",\"authors\":\"Zhongjiang He , Zhonghai He , Xiaofang Zhang\",\"doi\":\"10.1016/j.chemolab.2025.105472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The predictive performance of PLS models often diminishes for new samples outside the application domain and requires updating. The conventional approach involves augmenting the calibration set with new samples to accommodate changes. However, the efficiency of updating is hampered by the large number of samples in the old calibration set. To expedite the updating process, it is crucial to delete some samples, maintaining a balanced class ratio between old and new samples. Surprisingly, there has been a lack of discussion on how to select old samples thus far. This paper introduces a method known as Kernel Coefficient Selection (KCS), designed to obtain coefficients for each sample in the calibration model and subsequently identify crucial old samples. A large coefficient suggests the sample's significance and the need for retention. The rationale behind this method is grounded in the theory of the duality of two types of models (similarity-based and non-similarity-based). Removing samples with small coefficients from the calibration set still preserves the integrity of the major regression model. Comparative experiments were conducted between models that retained partial important samples and those involving all samples, using both simulated and real soybean meal datasets. The experimental results demonstrate that the KCS method mitigates the issue of number imbalance between old and new samples, thereby enhancing the efficiency of model updating.</div></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"264 \",\"pages\":\"Article 105472\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743925001571\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001571","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Selection of important samples for calibration model updating based on kernel coefficients
The predictive performance of PLS models often diminishes for new samples outside the application domain and requires updating. The conventional approach involves augmenting the calibration set with new samples to accommodate changes. However, the efficiency of updating is hampered by the large number of samples in the old calibration set. To expedite the updating process, it is crucial to delete some samples, maintaining a balanced class ratio between old and new samples. Surprisingly, there has been a lack of discussion on how to select old samples thus far. This paper introduces a method known as Kernel Coefficient Selection (KCS), designed to obtain coefficients for each sample in the calibration model and subsequently identify crucial old samples. A large coefficient suggests the sample's significance and the need for retention. The rationale behind this method is grounded in the theory of the duality of two types of models (similarity-based and non-similarity-based). Removing samples with small coefficients from the calibration set still preserves the integrity of the major regression model. Comparative experiments were conducted between models that retained partial important samples and those involving all samples, using both simulated and real soybean meal datasets. The experimental results demonstrate that the KCS method mitigates the issue of number imbalance between old and new samples, thereby enhancing the efficiency of model updating.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.