Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery

IF 6.5 1区 生物学 Q1 PLANT SCIENCES
Sebastian Varela, Xuying Zheng, Joyce Njuguna, Erik Sacks, Dylan Allen, Jeremy Ruhter, Andrew D B Leakey
{"title":"Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery","authors":"Sebastian Varela, Xuying Zheng, Joyce Njuguna, Erik Sacks, Dylan Allen, Jeremy Ruhter, Andrew D B Leakey","doi":"10.1093/plphys/kiaf132","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) can accelerate biological research. However, the adoption of such tools to facilitate phenotyping based on sensor data has been limited by (i) the need for a large amount of human-annotated training data for each context in which the tool is used and (ii) phenotypes varying across contexts defined in terms of genetics and environment. This is a major bottleneck because acquiring training data is generally costly and time-consuming. This study demonstrates how a ML approach can address these challenges by minimizing the amount of human supervision needed for tool building. A case study was performed to compare ML approaches that examine images collected by an uncrewed aerial vehicle to determine the presence/absence of panicles (i.e. “heading”) across thousands of field plots containing genetically diverse breeding populations of 2 Miscanthus species. Automated analysis of aerial imagery enabled the identification of heading approximately 9 times faster than in-field visual inspection by humans. Leveraging an Efficiently Supervised Generative Adversarial Network (ESGAN) learning strategy reduced the requirement for human-annotated data by 1 to 2 orders of magnitude compared to traditional, fully supervised learning approaches. The ESGAN model learned the salient features of the data set by using thousands of unlabeled images to inform the discriminative ability of a classifier so that it required minimal human-labeled training data. This method can accelerate the phenotyping of heading date as a measure of flowering time in Miscanthus across diverse contexts (e.g. in multistate trials) and opens avenues to promote the broad adoption of ML tools.","PeriodicalId":20101,"journal":{"name":"Plant Physiology","volume":"31 1","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Physiology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/plphys/kiaf132","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) can accelerate biological research. However, the adoption of such tools to facilitate phenotyping based on sensor data has been limited by (i) the need for a large amount of human-annotated training data for each context in which the tool is used and (ii) phenotypes varying across contexts defined in terms of genetics and environment. This is a major bottleneck because acquiring training data is generally costly and time-consuming. This study demonstrates how a ML approach can address these challenges by minimizing the amount of human supervision needed for tool building. A case study was performed to compare ML approaches that examine images collected by an uncrewed aerial vehicle to determine the presence/absence of panicles (i.e. “heading”) across thousands of field plots containing genetically diverse breeding populations of 2 Miscanthus species. Automated analysis of aerial imagery enabled the identification of heading approximately 9 times faster than in-field visual inspection by humans. Leveraging an Efficiently Supervised Generative Adversarial Network (ESGAN) learning strategy reduced the requirement for human-annotated data by 1 to 2 orders of magnitude compared to traditional, fully supervised learning approaches. The ESGAN model learned the salient features of the data set by using thousands of unlabeled images to inform the discriminative ability of a classifier so that it required minimal human-labeled training data. This method can accelerate the phenotyping of heading date as a measure of flowering time in Miscanthus across diverse contexts (e.g. in multistate trials) and opens avenues to promote the broad adoption of ML tools.
打破人工标注训练数据的障碍,利用航空图像进行机器学习辅助植物研究
机器学习(ML)可以加速生物学研究。然而,采用这些工具来促进基于传感器数据的表型分析受到以下限制:(i)需要为使用该工具的每个上下文提供大量人类注释的训练数据;(ii)根据遗传和环境定义的不同上下文的表型不同。这是一个主要的瓶颈,因为获取训练数据通常既昂贵又耗时。本研究展示了机器学习方法如何通过最大限度地减少工具构建所需的人工监督来解决这些挑战。进行了一个案例研究,以比较机器学习方法,这些方法检查由无人驾驶飞行器收集的图像,以确定包含2种芒草物种遗传多样性育种群体的数千个田间地块的穗(即“标题”)的存在/缺失。航拍图像的自动分析使识别航向的速度比人类在现场目视检查快了大约9倍。利用高效监督生成对抗网络(ESGAN)学习策略,与传统的完全监督学习方法相比,将对人工注释数据的需求降低了1到2个数量级。ESGAN模型通过使用数千个未标记的图像来学习数据集的显著特征,以告知分类器的判别能力,以便它需要最少的人工标记训练数据。这种方法可以在不同的环境下(例如,在多状态试验中)加速抽穗日期表型作为芒草开花时间的度量,并为促进ML工具的广泛采用开辟了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Physiology
Plant Physiology 生物-植物科学
CiteScore
12.20
自引率
5.40%
发文量
535
审稿时长
2.3 months
期刊介绍: Plant Physiology® is a distinguished and highly respected journal with a rich history dating back to its establishment in 1926. It stands as a leading international publication in the field of plant biology, covering a comprehensive range of topics from the molecular and structural aspects of plant life to systems biology and ecophysiology. Recognized as the most highly cited journal in plant sciences, Plant Physiology® is a testament to its commitment to excellence and the dissemination of groundbreaking research. As the official publication of the American Society of Plant Biologists, Plant Physiology® upholds rigorous peer-review standards, ensuring that the scientific community receives the highest quality research. The journal releases 12 issues annually, providing a steady stream of new findings and insights to its readership.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信