Sample selection method using near-infrared spectral information entropy as similarity criterion for constructing and updating peach firmness and soluble solids content prediction models
{"title":"Sample selection method using near-infrared spectral information entropy as similarity criterion for constructing and updating peach firmness and soluble solids content prediction models","authors":"Yande Liu, Cong He, Xiaogang Jiang","doi":"10.1002/cem.3528","DOIUrl":null,"url":null,"abstract":"<p>When using near-infrared (NIR) techniques for analysis, model construction and maintenance updates are essential. When model construction is performed in machine learning, the sample set is usually divided into the calibration set and the validation set. The representativeness of the calibration set and the reasonable distribution of the validation set affects the accuracy of the established model. In addition, when maintaining and updating models, selecting the most informative updated sample not only improves the model prediction accuracy but also reduces sample preparation. In this paper, the spectral information entropy (SIE) is proposed to be used as a similarity criterion for dividing the sample set and use this criterion to select updated samples. The Kennard–Stone (KS) and the sample set portioning based on joint <i>x</i>–<i>y</i> distance (SPXY) methods were used for comparison to verify the superiority of the proposed method. The results showed that the model built after dividing the sample set using the SIE method has good prediction effect compared with KS and SPXY method. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient (\n<math>\n <msubsup>\n <mi>R</mi>\n <mi>P</mi>\n <mn>2</mn>\n </msubsup></math>) was improved by more than 15%, and the root mean square error (RMSE) of prediction was reduced by 50%. In terms of model updating, selecting a small number of updated samples using the SIE method can improve the correlation coefficient (\n<math>\n <mrow>\n <msubsup>\n <mi>R</mi>\n <mi>P</mi>\n <mrow></mrow>\n </msubsup>\n </mrow></math>) to more than 80%, and updated models' prediction accuracy is higher than that of KS and SPXY method. It is confirmed that the SIE method can make the NIR analysis technique more reliable.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3528","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
When using near-infrared (NIR) techniques for analysis, model construction and maintenance updates are essential. When model construction is performed in machine learning, the sample set is usually divided into the calibration set and the validation set. The representativeness of the calibration set and the reasonable distribution of the validation set affects the accuracy of the established model. In addition, when maintaining and updating models, selecting the most informative updated sample not only improves the model prediction accuracy but also reduces sample preparation. In this paper, the spectral information entropy (SIE) is proposed to be used as a similarity criterion for dividing the sample set and use this criterion to select updated samples. The Kennard–Stone (KS) and the sample set portioning based on joint x–y distance (SPXY) methods were used for comparison to verify the superiority of the proposed method. The results showed that the model built after dividing the sample set using the SIE method has good prediction effect compared with KS and SPXY method. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient (
) was improved by more than 15%, and the root mean square error (RMSE) of prediction was reduced by 50%. In terms of model updating, selecting a small number of updated samples using the SIE method can improve the correlation coefficient (
) to more than 80%, and updated models' prediction accuracy is higher than that of KS and SPXY method. It is confirmed that the SIE method can make the NIR analysis technique more reliable.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.