{"title":"植物标本照片图像分类模型训练库的高效生成","authors":"A. Schmidt‐Lebuhn, Nunzio J. Knerr","doi":"10.1086/724950","DOIUrl":null,"url":null,"abstract":"Premise of research. Computer vision has the potential to become a transformative identification tool in biodiversity research and collections management, allowing high-throughput identification and removing the need for nonexpert end users to understand technical terminology. A major bottleneck for taxonomists is the generation of sufficient numbers of training images. Contemporary large-scale imaging projects of herbaria provide an increasing number of specimen photos, but whole-sheet images are not directly suitable for training image classification models targeted at individual taxonomically informative characters. Methodology. Here, we illustrate a time- and labor-efficient approach for generating training libraries for image classification from photos of herbarium sheets. It involves the annotation of specimen images with bounding boxes using open-source software and automated cropping of annotations with a custom script to produce the training library. We demonstrate the approach on the flower heads of a genus of Asteraceae comprising eight taxa, six species and two nontypus varieties. Pivotal results. After generating 816 training images from 33 specimen photos with a time investment of only ∼90 min, we trained an image classification model that achieved 98.2% precision and recall. Conclusions. The demonstrated approach allows taxonomists to use digitized herbarium specimens to produce training libraries for image classification models within hours. We expect that computer vision will increasingly become a part of taxonomic practice.","PeriodicalId":14306,"journal":{"name":"INTERNATIONAL JOURNAL OF PLANT SCIENCES","volume":"98 1","pages":"387 - 391"},"PeriodicalIF":1.5000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Generation of Training Libraries for Image Classification Models from Photos of Herbarium Specimens\",\"authors\":\"A. Schmidt‐Lebuhn, Nunzio J. Knerr\",\"doi\":\"10.1086/724950\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Premise of research. Computer vision has the potential to become a transformative identification tool in biodiversity research and collections management, allowing high-throughput identification and removing the need for nonexpert end users to understand technical terminology. A major bottleneck for taxonomists is the generation of sufficient numbers of training images. Contemporary large-scale imaging projects of herbaria provide an increasing number of specimen photos, but whole-sheet images are not directly suitable for training image classification models targeted at individual taxonomically informative characters. Methodology. Here, we illustrate a time- and labor-efficient approach for generating training libraries for image classification from photos of herbarium sheets. It involves the annotation of specimen images with bounding boxes using open-source software and automated cropping of annotations with a custom script to produce the training library. We demonstrate the approach on the flower heads of a genus of Asteraceae comprising eight taxa, six species and two nontypus varieties. Pivotal results. After generating 816 training images from 33 specimen photos with a time investment of only ∼90 min, we trained an image classification model that achieved 98.2% precision and recall. Conclusions. The demonstrated approach allows taxonomists to use digitized herbarium specimens to produce training libraries for image classification models within hours. We expect that computer vision will increasingly become a part of taxonomic practice.\",\"PeriodicalId\":14306,\"journal\":{\"name\":\"INTERNATIONAL JOURNAL OF PLANT SCIENCES\",\"volume\":\"98 1\",\"pages\":\"387 - 391\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INTERNATIONAL JOURNAL OF PLANT SCIENCES\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1086/724950\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PLANT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INTERNATIONAL JOURNAL OF PLANT SCIENCES","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1086/724950","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
Efficient Generation of Training Libraries for Image Classification Models from Photos of Herbarium Specimens
Premise of research. Computer vision has the potential to become a transformative identification tool in biodiversity research and collections management, allowing high-throughput identification and removing the need for nonexpert end users to understand technical terminology. A major bottleneck for taxonomists is the generation of sufficient numbers of training images. Contemporary large-scale imaging projects of herbaria provide an increasing number of specimen photos, but whole-sheet images are not directly suitable for training image classification models targeted at individual taxonomically informative characters. Methodology. Here, we illustrate a time- and labor-efficient approach for generating training libraries for image classification from photos of herbarium sheets. It involves the annotation of specimen images with bounding boxes using open-source software and automated cropping of annotations with a custom script to produce the training library. We demonstrate the approach on the flower heads of a genus of Asteraceae comprising eight taxa, six species and two nontypus varieties. Pivotal results. After generating 816 training images from 33 specimen photos with a time investment of only ∼90 min, we trained an image classification model that achieved 98.2% precision and recall. Conclusions. The demonstrated approach allows taxonomists to use digitized herbarium specimens to produce training libraries for image classification models within hours. We expect that computer vision will increasingly become a part of taxonomic practice.
期刊介绍:
The International Journal of Plant Sciences has a distinguished history of publishing research in the plant sciences since 1875. IJPS presents high quality, original, peer-reviewed research from laboratories around the world in all areas of the plant sciences. Topics covered range from genetics and genomics, developmental and cell biology, biochemistry and physiology, to morphology and anatomy, systematics, evolution, paleobotany, plant-microbe interactions, and ecology. IJPS does NOT publish papers on agriculture or crop improvement. In addition to full-length research papers, IJPS publishes review articles, including the open access Coulter Reviews, rapid communications, and perspectives. IJPS welcomes contributions that present evaluations and new perspectives on areas of current interest in plant biology. IJPS publishes nine issues per year and regularly features special issues on topics of particular interest, including new and exciting research originally presented at major botanical conferences.