Sebastian Varela, Xuying Zheng, Joyce Njuguna, Erik Sacks, Dylan Allen, Jeremy Ruhter, Andrew D B Leakey
{"title":"Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery","authors":"Sebastian Varela, Xuying Zheng, Joyce Njuguna, Erik Sacks, Dylan Allen, Jeremy Ruhter, Andrew D B Leakey","doi":"10.1093/plphys/kiaf132","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) can accelerate biological research. However, the adoption of such tools to facilitate phenotyping based on sensor data has been limited by (i) the need for a large amount of human-annotated training data for each context in which the tool is used and (ii) phenotypes varying across contexts defined in terms of genetics and environment. This is a major bottleneck because acquiring training data is generally costly and time-consuming. This study demonstrates how a ML approach can address these challenges by minimizing the amount of human supervision needed for tool building. A case study was performed to compare ML approaches that examine images collected by an uncrewed aerial vehicle to determine the presence/absence of panicles (i.e. “heading”) across thousands of field plots containing genetically diverse breeding populations of 2 Miscanthus species. Automated analysis of aerial imagery enabled the identification of heading approximately 9 times faster than in-field visual inspection by humans. Leveraging an Efficiently Supervised Generative Adversarial Network (ESGAN) learning strategy reduced the requirement for human-annotated data by 1 to 2 orders of magnitude compared to traditional, fully supervised learning approaches. The ESGAN model learned the salient features of the data set by using thousands of unlabeled images to inform the discriminative ability of a classifier so that it required minimal human-labeled training data. This method can accelerate the phenotyping of heading date as a measure of flowering time in Miscanthus across diverse contexts (e.g. in multistate trials) and opens avenues to promote the broad adoption of ML tools.","PeriodicalId":20101,"journal":{"name":"Plant Physiology","volume":"31 1","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Physiology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/plphys/kiaf132","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) can accelerate biological research. However, the adoption of such tools to facilitate phenotyping based on sensor data has been limited by (i) the need for a large amount of human-annotated training data for each context in which the tool is used and (ii) phenotypes varying across contexts defined in terms of genetics and environment. This is a major bottleneck because acquiring training data is generally costly and time-consuming. This study demonstrates how a ML approach can address these challenges by minimizing the amount of human supervision needed for tool building. A case study was performed to compare ML approaches that examine images collected by an uncrewed aerial vehicle to determine the presence/absence of panicles (i.e. “heading”) across thousands of field plots containing genetically diverse breeding populations of 2 Miscanthus species. Automated analysis of aerial imagery enabled the identification of heading approximately 9 times faster than in-field visual inspection by humans. Leveraging an Efficiently Supervised Generative Adversarial Network (ESGAN) learning strategy reduced the requirement for human-annotated data by 1 to 2 orders of magnitude compared to traditional, fully supervised learning approaches. The ESGAN model learned the salient features of the data set by using thousands of unlabeled images to inform the discriminative ability of a classifier so that it required minimal human-labeled training data. This method can accelerate the phenotyping of heading date as a measure of flowering time in Miscanthus across diverse contexts (e.g. in multistate trials) and opens avenues to promote the broad adoption of ML tools.
期刊介绍:
Plant Physiology® is a distinguished and highly respected journal with a rich history dating back to its establishment in 1926. It stands as a leading international publication in the field of plant biology, covering a comprehensive range of topics from the molecular and structural aspects of plant life to systems biology and ecophysiology. Recognized as the most highly cited journal in plant sciences, Plant Physiology® is a testament to its commitment to excellence and the dissemination of groundbreaking research.
As the official publication of the American Society of Plant Biologists, Plant Physiology® upholds rigorous peer-review standards, ensuring that the scientific community receives the highest quality research. The journal releases 12 issues annually, providing a steady stream of new findings and insights to its readership.