The research on enhancing LA estimation accuracy across domains for small sample data based on data augmentation and data transfer integration optimization system

IF 5.7 Q1 AGRICULTURAL ENGINEERING

Smart agricultural technology Pub Date : 2025-07-01 DOI:10.1016/j.atech.2025.101148

Ai-Dong Wang , Rui-Jie Li , Xiang-Qian Feng , Zi-Qiu Li , Wei-Yuan Hong , Hua-Xing Wu , Dan-Ying Wang , Song Chen

{"title":"The research on enhancing LA estimation accuracy across domains for small sample data based on data augmentation and data transfer integration optimization system","authors":"Ai-Dong Wang , Rui-Jie Li , Xiang-Qian Feng , Zi-Qiu Li , Wei-Yuan Hong , Hua-Xing Wu , Dan-Ying Wang , Song Chen","doi":"10.1016/j.atech.2025.101148","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><div>The efficient and precise monitoring of rice leaf area (LA) is essential for variety selection and agricultural management. At present, LA estimation models based on high-throughput phenotyping technologies primarily depend on homogenized large sample datasets. These models encounter generalization challenges when applied to heterogeneous scenarios with small sample sizes.</div></div><div><h3>Objective</h3><div>In this research, our goal is to develop a novel framework to mitigate prediction biases in LA caused by sample limitations and data heterogeneity. This framework integrates machine learning models to establish a universal solution for cross-domain LA estimation in data-scarce situations.</div></div><div><h3>Methods</h3><div>This research utilizes canopy image data acquired from the 2023–2024 rice full-cycle multi-view RGB imaging system (with dual front and side camera positions). Fourteen morphological feature parameters are constructed, and the leaf area values are measured through destructive sampling, together forming the dataset. A comprehensive comparison of six algorithms (linear regression, support vector regression, random forest, XGBoost, CatBoost, and K-nearest neighbors) is conducted, assessing their performance under a combined strategy of data augmentation (noise injection, generative adversarial networks, Gaussian mixture model, variational autoencoders) and transfer learning (random, clustering, and hierarchical parameter transfer).</div></div><div><h3>Results and conclusions</h3><div>The results demonstrate that the integrated optimization system (Gaussian Mixture Model Generation-Cluster-Based Transfer, GMM-CBT) achieved optimal performance when combined with XGBoost (validation <em>R</em><sup><em>2</em></sup>=0.85, test <em>R</em><sup><em>2</em></sup>=0.85), outperforming both standalone approaches: data augmentation (validation <em>R</em><sup><em>2</em></sup>=0.87, test <em>R</em><sup><em>2</em></sup>=-0.37) and transfer learning (validation <em>R</em><sup><em>2</em></sup>=0.84, test <em>R</em><sup><em>2</em></sup>=0.84). The framework clusters heterogeneous data based on morphological features (such as size, compactness, and roundness) and constructs a transfer sample library with feature coverage.</div></div><div><h3>Significance</h3><div>The proposed methodology advances precision agriculture by enabling single-plant LA monitoring, with potential extensions to other crops and trait-phenotyping applications.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"12 ","pages":"Article 101148"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525003806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Context

The efficient and precise monitoring of rice leaf area (LA) is essential for variety selection and agricultural management. At present, LA estimation models based on high-throughput phenotyping technologies primarily depend on homogenized large sample datasets. These models encounter generalization challenges when applied to heterogeneous scenarios with small sample sizes.

Objective

In this research, our goal is to develop a novel framework to mitigate prediction biases in LA caused by sample limitations and data heterogeneity. This framework integrates machine learning models to establish a universal solution for cross-domain LA estimation in data-scarce situations.

Methods

This research utilizes canopy image data acquired from the 2023–2024 rice full-cycle multi-view RGB imaging system (with dual front and side camera positions). Fourteen morphological feature parameters are constructed, and the leaf area values are measured through destructive sampling, together forming the dataset. A comprehensive comparison of six algorithms (linear regression, support vector regression, random forest, XGBoost, CatBoost, and K-nearest neighbors) is conducted, assessing their performance under a combined strategy of data augmentation (noise injection, generative adversarial networks, Gaussian mixture model, variational autoencoders) and transfer learning (random, clustering, and hierarchical parameter transfer).

Results and conclusions

The results demonstrate that the integrated optimization system (Gaussian Mixture Model Generation-Cluster-Based Transfer, GMM-CBT) achieved optimal performance when combined with XGBoost (validation R²=0.85, test R²=0.85), outperforming both standalone approaches: data augmentation (validation R²=0.87, test R²=-0.37) and transfer learning (validation R²=0.84, test R²=0.84). The framework clusters heterogeneous data based on morphological features (such as size, compactness, and roundness) and constructs a transfer sample library with feature coverage.

Significance

The proposed methodology advances precision agriculture by enabling single-plant LA monitoring, with potential extensions to other crops and trait-phenotyping applications.

查看原文本刊更多论文

基于数据增强和数据传输集成优化系统的小样本数据跨域LA估计精度提高研究

高效、精确的水稻叶面积监测对水稻品种选择和农业管理具有重要意义。目前，基于高通量表型技术的LA估计模型主要依赖于均匀化的大样本数据集。这些模型在应用于具有小样本量的异构场景时遇到泛化挑战。在本研究中，我们的目标是建立一个新的框架，以减轻由样本限制和数据异质性引起的LA预测偏差。该框架集成了机器学习模型，为数据稀缺情况下的跨域LA估计建立了通用解决方案。方法利用2023-2024年水稻全周期多视角RGB成像系统（前置和侧置双摄像头）的冠层图像数据。构建14个形态特征参数，并通过破坏性采样测量叶面积值，共同形成数据集。对六种算法（线性回归、支持向量回归、随机森林、XGBoost、CatBoost和k近邻）进行了全面比较，评估了它们在数据增强（噪声注入、生成对抗网络、高斯混合模型、变分自编码器）和迁移学习（随机、聚类和分层参数传递）组合策略下的性能。结果与结论结果表明，集成优化系统（高斯混合模型生成-基于簇的迁移，GMM-CBT）在与XGBoost（验证R2=0.85，检验R2=0.85）结合时取得了最佳性能，优于两种独立方法：数据增强（验证R2=0.87，检验R2=-0.37）和迁移学习（验证R2=0.84，检验R2=0.84）。该框架基于形态学特征（如大小、紧凑度和圆度）对异构数据进行聚类，并构建具有特征覆盖率的传输样本库。所提出的方法通过实现单株LA监测来推进精准农业，并有可能扩展到其他作物和性状表型应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Smart agricultural technology

CiteScore

4.20

自引率

0.00%

发文量