The research on enhancing LA estimation accuracy across domains for small sample data based on data augmentation and data transfer integration optimization system
Ai-Dong Wang , Rui-Jie Li , Xiang-Qian Feng , Zi-Qiu Li , Wei-Yuan Hong , Hua-Xing Wu , Dan-Ying Wang , Song Chen
{"title":"The research on enhancing LA estimation accuracy across domains for small sample data based on data augmentation and data transfer integration optimization system","authors":"Ai-Dong Wang , Rui-Jie Li , Xiang-Qian Feng , Zi-Qiu Li , Wei-Yuan Hong , Hua-Xing Wu , Dan-Ying Wang , Song Chen","doi":"10.1016/j.atech.2025.101148","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><div>The efficient and precise monitoring of rice leaf area (LA) is essential for variety selection and agricultural management. At present, LA estimation models based on high-throughput phenotyping technologies primarily depend on homogenized large sample datasets. These models encounter generalization challenges when applied to heterogeneous scenarios with small sample sizes.</div></div><div><h3>Objective</h3><div>In this research, our goal is to develop a novel framework to mitigate prediction biases in LA caused by sample limitations and data heterogeneity. This framework integrates machine learning models to establish a universal solution for cross-domain LA estimation in data-scarce situations.</div></div><div><h3>Methods</h3><div>This research utilizes canopy image data acquired from the 2023–2024 rice full-cycle multi-view RGB imaging system (with dual front and side camera positions). Fourteen morphological feature parameters are constructed, and the leaf area values are measured through destructive sampling, together forming the dataset. A comprehensive comparison of six algorithms (linear regression, support vector regression, random forest, XGBoost, CatBoost, and K-nearest neighbors) is conducted, assessing their performance under a combined strategy of data augmentation (noise injection, generative adversarial networks, Gaussian mixture model, variational autoencoders) and transfer learning (random, clustering, and hierarchical parameter transfer).</div></div><div><h3>Results and conclusions</h3><div>The results demonstrate that the integrated optimization system (Gaussian Mixture Model Generation-Cluster-Based Transfer, GMM-CBT) achieved optimal performance when combined with XGBoost (validation <em>R</em><sup><em>2</em></sup>=0.85, test <em>R</em><sup><em>2</em></sup>=0.85), outperforming both standalone approaches: data augmentation (validation <em>R</em><sup><em>2</em></sup>=0.87, test <em>R</em><sup><em>2</em></sup>=-0.37) and transfer learning (validation <em>R</em><sup><em>2</em></sup>=0.84, test <em>R</em><sup><em>2</em></sup>=0.84). The framework clusters heterogeneous data based on morphological features (such as size, compactness, and roundness) and constructs a transfer sample library with feature coverage.</div></div><div><h3>Significance</h3><div>The proposed methodology advances precision agriculture by enabling single-plant LA monitoring, with potential extensions to other crops and trait-phenotyping applications.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"12 ","pages":"Article 101148"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525003806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Context
The efficient and precise monitoring of rice leaf area (LA) is essential for variety selection and agricultural management. At present, LA estimation models based on high-throughput phenotyping technologies primarily depend on homogenized large sample datasets. These models encounter generalization challenges when applied to heterogeneous scenarios with small sample sizes.
Objective
In this research, our goal is to develop a novel framework to mitigate prediction biases in LA caused by sample limitations and data heterogeneity. This framework integrates machine learning models to establish a universal solution for cross-domain LA estimation in data-scarce situations.
Methods
This research utilizes canopy image data acquired from the 2023–2024 rice full-cycle multi-view RGB imaging system (with dual front and side camera positions). Fourteen morphological feature parameters are constructed, and the leaf area values are measured through destructive sampling, together forming the dataset. A comprehensive comparison of six algorithms (linear regression, support vector regression, random forest, XGBoost, CatBoost, and K-nearest neighbors) is conducted, assessing their performance under a combined strategy of data augmentation (noise injection, generative adversarial networks, Gaussian mixture model, variational autoencoders) and transfer learning (random, clustering, and hierarchical parameter transfer).
Results and conclusions
The results demonstrate that the integrated optimization system (Gaussian Mixture Model Generation-Cluster-Based Transfer, GMM-CBT) achieved optimal performance when combined with XGBoost (validation R2=0.85, test R2=0.85), outperforming both standalone approaches: data augmentation (validation R2=0.87, test R2=-0.37) and transfer learning (validation R2=0.84, test R2=0.84). The framework clusters heterogeneous data based on morphological features (such as size, compactness, and roundness) and constructs a transfer sample library with feature coverage.
Significance
The proposed methodology advances precision agriculture by enabling single-plant LA monitoring, with potential extensions to other crops and trait-phenotyping applications.