Nicola Dilillo , Andrea Sanna , Elena Belcore , Kyra Smith , Marco Piras , Bartolomeo Montrucchio , Renato Ferrero
{"title":"Enhancing lettuce classification: Optimizing spectral wavelength selection via CCARS and PLS-DA","authors":"Nicola Dilillo , Andrea Sanna , Elena Belcore , Kyra Smith , Marco Piras , Bartolomeo Montrucchio , Renato Ferrero","doi":"10.1016/j.atech.2025.100962","DOIUrl":null,"url":null,"abstract":"<div><div>Spectroscopy is a valuable tool for analyzing the inside of plants. In this field, plant health is evaluated through light analysis, specifically by examining wavelengths beyond the visible spectrum, making it essential to select the most representative wavelength. The Competitive Adaptive Reweighted Sampling (CARS) algorithm has been applied efficiently in the literature to select the best variables in several applications, including agricultural monitoring, nutrient analysis, and chemometrics. This study presents the Calibrated CARS (CCARS) algorithm, an extension of CARS, alongside the Partial Least Square Discriminant Analysis (PLS-DA) model. The algorithm is developed to identify critical informative wavelengths of a spectral dataset of lettuce to facilitate the creation of streamlined and efficient models for lettuce health classification. While effective with spectral data, the PLS-DA models tend to overfit, and to address this problem a rigorous systematic evaluation approach is employed. Permutation tests are conducted to verify the model's robustness, while learning curve analyses ensure the model's capacity to generalize data. With this comprehensive evaluation method, confidence in the robustness of the PLS-DA models is instilled, ensuring model stability, which is achieved thanks to the CCARS algorithm instead of the original version. The results demonstrate that using CCARS with 3 or 4 PLS components and only 30 or 19 selected wavelengths reduces the number of variables by 97%, without sacrificing accuracy, and with a statistically significant robust model.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"11 ","pages":"Article 100962"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525001959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Spectroscopy is a valuable tool for analyzing the inside of plants. In this field, plant health is evaluated through light analysis, specifically by examining wavelengths beyond the visible spectrum, making it essential to select the most representative wavelength. The Competitive Adaptive Reweighted Sampling (CARS) algorithm has been applied efficiently in the literature to select the best variables in several applications, including agricultural monitoring, nutrient analysis, and chemometrics. This study presents the Calibrated CARS (CCARS) algorithm, an extension of CARS, alongside the Partial Least Square Discriminant Analysis (PLS-DA) model. The algorithm is developed to identify critical informative wavelengths of a spectral dataset of lettuce to facilitate the creation of streamlined and efficient models for lettuce health classification. While effective with spectral data, the PLS-DA models tend to overfit, and to address this problem a rigorous systematic evaluation approach is employed. Permutation tests are conducted to verify the model's robustness, while learning curve analyses ensure the model's capacity to generalize data. With this comprehensive evaluation method, confidence in the robustness of the PLS-DA models is instilled, ensuring model stability, which is achieved thanks to the CCARS algorithm instead of the original version. The results demonstrate that using CCARS with 3 or 4 PLS components and only 30 or 19 selected wavelengths reduces the number of variables by 97%, without sacrificing accuracy, and with a statistically significant robust model.