From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2

IF 2.4 3区生物学 Q2 PLANT SCIENCES

Applications in Plant Sciences Pub Date : 2023-10-16 DOI:10.1002/aps3.11548

William N. Weaver, Stephen A. Smith

{"title":"From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2","authors":"William N. Weaver, Stephen A. Smith","doi":"10.1002/aps3.11548","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Premise</h3>\n \n <p>Quantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>LeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo-landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>LeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel-to-metric conversion factors for 26 commonly used ruler types.</p>\n </section>\n \n <section>\n \n <h3> Discussion</h3>\n \n <p>LeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non-archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.</p>\n </section>\n </div>","PeriodicalId":8022,"journal":{"name":"Applications in Plant Sciences","volume":"11 5","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applications in Plant Sciences","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aps3.11548","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PLANT SCIENCES","Score":null,"Total":0}

引用次数: 4

Abstract

Premise

Quantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets.

Methods

LeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo-landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms.

Results

LeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel-to-metric conversion factors for 26 commonly used ruler types.

Discussion

LeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non-archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.

Abstract Image

查看原文本刊更多论文

从叶子到标签：使用LeafMachine2构建用于快速植物标本分析的模块化机器学习网络。

前提：植物数量性状在生物学研究中起着至关重要的作用。然而，传统的植物形态测量方法耗时且可扩展性有限。我们介绍了LeafMachine2，这是一套模块化的机器学习和计算机视觉工具，可以从数字植物数据集中自动提取一组基本的叶片特征。方法：对LeafMachine2进行494766个人工注释的训练，这些注释来自288个机构的5648张植物标本馆图像，代表2663个物种；它采用一套植物成分检测和分割算法来分离单个叶片、叶柄、果实、花朵、木材样本、芽和根。我们的陆地标记网络自动识别和测量出现在大多数阔叶分类群上的九个伪地标。文本标签和条形码由档案组件检测器自动识别，并为光学字符识别方法或自然语言处理算法做好准备。结果：LeafMachine2可以从至少245个被子植物科中提取性状数据，并计算26种常用标尺类型的像素-度量转换因子。讨论：LeafMachine2是一种高效的工具，可以生成大量的植物特征数据，甚至可以从遮挡或重叠的叶子、田间图像和非档案数据集中生成。我们的项目以及类似的举措，在消除植物标本馆植物性状数据采集的瓶颈方面取得了重大进展，并将重点转移到数据修订和质量控制的关键任务上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applications in Plant Sciences PLANT SCIENCES-

CiteScore

7.30

自引率

0.00%

发文量

审稿时长

12 weeks

期刊介绍： Applications in Plant Sciences (APPS) is a monthly, peer-reviewed, open access journal promoting the rapid dissemination of newly developed, innovative tools and protocols in all areas of the plant sciences, including genetics, structure, function, development, evolution, systematics, and ecology. Given the rapid progress today in technology and its application in the plant sciences, the goal of APPS is to foster communication within the plant science community to advance scientific research. APPS is a publication of the Botanical Society of America, originating in 2009 as the American Journal of Botany''s online-only section, AJB Primer Notes & Protocols in the Plant Sciences. APPS publishes the following types of articles: (1) Protocol Notes describe new methods and technological advancements; (2) Genomic Resources Articles characterize the development and demonstrate the usefulness of newly developed genomic resources, including transcriptomes; (3) Software Notes detail new software applications; (4) Application Articles illustrate the application of a new protocol, method, or software application within the context of a larger study; (5) Review Articles evaluate available techniques, methods, or protocols; (6) Primer Notes report novel genetic markers with evidence of wide applicability.