Improving accuracy and generalization in single kernel oil characteristics prediction in maize using NIR-HSI and a knowledge-injected spectral tabtransformer

IF 12.4 Q1 AGRICULTURE, MULTIDISCIPLINARY

Artificial Intelligence in Agriculture Pub Date : 2025-06-11 DOI:10.1016/j.aiia.2025.05.007

Anran Song , Xinyu Guo , Weiliang Wen , Chuanyu Wang , Shenghao Gu , Xiaoqian Chen , Juan Wang , Chunjiang Zhao

{"title":"Improving accuracy and generalization in single kernel oil characteristics prediction in maize using NIR-HSI and a knowledge-injected spectral tabtransformer","authors":"Anran Song , Xinyu Guo , Weiliang Wen , Chuanyu Wang , Shenghao Gu , Xiaoqian Chen , Juan Wang , Chunjiang Zhao","doi":"10.1016/j.aiia.2025.05.007","DOIUrl":null,"url":null,"abstract":"<div><div>Near-infrared spectroscopy hyperspectral imaging (NIR-HSI) is widely used for seed component prediction due to its non-destructive and rapid nature. However, existing models often suffer from limited generalization, particularly when trained on small datasets, and there is a lack of effective deep learning (DL) models for spectral data analysis. To address these challenges, we propose the Knowledge-Injected Spectral TabTransformer (KIT-Spectral TabTransformer), an innovative adaptation of the traditional TabTransformer specifically designed for maize seeds. By integrating domain-specific knowledge, this approach enhances model training efficiency and predictive accuracy while reducing reliance on large datasets. The generalization capability of the model was rigorously validated through ten-fold cross-validation (10-CV). Compared to traditional machine learning methods, the attention-based CNN (ACNNR), and the Oil Characteristics Predictor Transformer (OCP-Transformer), the KIT-Spectral TabTransformer demonstrated superior performance in oil mass prediction, achieving <span><math><msubsup><mi>R</mi><mi>p</mi><mn>2</mn></msubsup></math></span>= 0.9238 ± 0.0346, RMSE<sub>p</sub> = 0.1746 ± 0.0401. For oil content, <span><math><msubsup><mi>R</mi><mi>p</mi><mn>2</mn></msubsup></math></span>= 0.9602 ± 0.0180 and RMSE<sub>p</sub> = 0.5301 ± 0.1446 on a dataset with oil content ranging from 0.81 % to 13.07 %. On the independent validation set, our model achieved <span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> values of 0.7820 and 0.7586, along with RPD values of 2.1420 and 2.0355 in the two tasks, highlighting its strong prediction capability and potential for real-world application. These findings offer a potential method and direction for single seed oil prediction and related crop component analysis.</div></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"15 4","pages":"Pages 802-815"},"PeriodicalIF":12.4000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721725000625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Near-infrared spectroscopy hyperspectral imaging (NIR-HSI) is widely used for seed component prediction due to its non-destructive and rapid nature. However, existing models often suffer from limited generalization, particularly when trained on small datasets, and there is a lack of effective deep learning (DL) models for spectral data analysis. To address these challenges, we propose the Knowledge-Injected Spectral TabTransformer (KIT-Spectral TabTransformer), an innovative adaptation of the traditional TabTransformer specifically designed for maize seeds. By integrating domain-specific knowledge, this approach enhances model training efficiency and predictive accuracy while reducing reliance on large datasets. The generalization capability of the model was rigorously validated through ten-fold cross-validation (10-CV). Compared to traditional machine learning methods, the attention-based CNN (ACNNR), and the Oil Characteristics Predictor Transformer (OCP-Transformer), the KIT-Spectral TabTransformer demonstrated superior performance in oil mass prediction, achieving

R_{p}^{2}

= 0.9238 ± 0.0346, RMSE_p = 0.1746 ± 0.0401. For oil content,

R_{p}^{2}

= 0.9602 ± 0.0180 and RMSE_p = 0.5301 ± 0.1446 on a dataset with oil content ranging from 0.81 % to 13.07 %. On the independent validation set, our model achieved

R^{2}

values of 0.7820 and 0.7586, along with RPD values of 2.1420 and 2.0355 in the two tasks, highlighting its strong prediction capability and potential for real-world application. These findings offer a potential method and direction for single seed oil prediction and related crop component analysis.

查看原文本刊更多论文

利用NIR-HSI和知识注入谱表转换器提高玉米单粒油特性预测的准确性和泛化性

近红外光谱高光谱成像（NIR-HSI）因其无损、快速等优点被广泛应用于种子成分预测。然而，现有模型通常泛化有限，特别是在小数据集上训练时，并且缺乏用于光谱数据分析的有效深度学习（DL）模型。为了解决这些挑战，我们提出了知识注入光谱TabTransformer (KIT-Spectral TabTransformer)，这是一种专门为玉米种子设计的传统TabTransformer的创新改编。通过集成领域特定知识，该方法提高了模型训练效率和预测准确性，同时减少了对大型数据集的依赖。通过10倍交叉验证（10-CV）严格验证了模型的泛化能力。与传统的机器学习方法、基于注意力的CNN （attention-based CNN， ACNNR）和油特性预测变压器（Oil characteristic Predictor Transformer, OCP-Transformer）相比，KIT-Spectral TabTransformer在油质量预测方面表现出更优异的性能，Rp2= 0.9238±0.0346,RMSEp = 0.1746±0.0401。在含油量为0.81% ~ 13.07%的数据集上，Rp2= 0.9602±0.0180,RMSEp = 0.5301±0.1446。在独立验证集上，我们的模型在两个任务中的R2值分别为0.7820和0.7586，RPD值分别为2.1420和2.0355，显示出了较强的预测能力和实际应用潜力。这些发现为单粒种子油脂预测及相关作物成分分析提供了潜在的方法和方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊