Pretrained Convolutional Neural Networks Perform Well in a Challenging Test Case: Identification of Plant Bugs (Hemiptera: Miridae) Using a Small Number of Training Images

IF 3.1 1区农林科学 Q1 ENTOMOLOGY

Insect Systematics and Diversity Pub Date : 2021-03-01 DOI:10.1093/isd/ixab004

A. Knyshov, Samantha Hoang, C. Weirauch

{"title":"Pretrained Convolutional Neural Networks Perform Well in a Challenging Test Case: Identification of Plant Bugs (Hemiptera: Miridae) Using a Small Number of Training Images","authors":"A. Knyshov, Samantha Hoang, C. Weirauch","doi":"10.1093/isd/ixab004","DOIUrl":null,"url":null,"abstract":"Abstract Automated insect identification systems have been explored for more than two decades but have only recently started to take advantage of powerful and versatile convolutional neural networks (CNNs). While typical CNN applications still require large training image datasets with hundreds of images per taxon, pretrained CNNs recently have been shown to be highly accurate, while being trained on much smaller datasets. We here evaluate the performance of CNN-based machine learning approaches in identifying three curated species-level dorsal habitus datasets for Miridae, the plant bugs. Miridae are of economic importance, but species-level identifications are challenging and typically rely on information other than dorsal habitus (e.g., host plants, locality, genitalic structures). Each dataset contained 2–6 species and 126–246 images in total, with a mean of only 32 images per species for the most difficult dataset. We find that closely related species of plant bugs can be identified with 80–90% accuracy based on their dorsal habitus alone. The pretrained CNN performed 10–20% better than a taxon expert who had access to the same dorsal habitus images. We find that feature extraction protocols (selection and combination of blocks of CNN layers) impact identification accuracy much more than the classifying mechanism (support vector machine and deep neural network classifiers). While our network has much lower accuracy on photographs of live insects (62%), overall results confirm that a pretrained CNN can be straightforwardly adapted to collection-based images for a new taxonomic group and successfully extract relevant features to classify insect species.","PeriodicalId":48498,"journal":{"name":"Insect Systematics and Diversity","volume":" ","pages":"1 - 10"},"PeriodicalIF":3.1000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/isd/ixab004","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insect Systematics and Diversity","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/isd/ixab004","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENTOMOLOGY","Score":null,"Total":0}

引用次数: 9

Abstract

Abstract Automated insect identification systems have been explored for more than two decades but have only recently started to take advantage of powerful and versatile convolutional neural networks (CNNs). While typical CNN applications still require large training image datasets with hundreds of images per taxon, pretrained CNNs recently have been shown to be highly accurate, while being trained on much smaller datasets. We here evaluate the performance of CNN-based machine learning approaches in identifying three curated species-level dorsal habitus datasets for Miridae, the plant bugs. Miridae are of economic importance, but species-level identifications are challenging and typically rely on information other than dorsal habitus (e.g., host plants, locality, genitalic structures). Each dataset contained 2–6 species and 126–246 images in total, with a mean of only 32 images per species for the most difficult dataset. We find that closely related species of plant bugs can be identified with 80–90% accuracy based on their dorsal habitus alone. The pretrained CNN performed 10–20% better than a taxon expert who had access to the same dorsal habitus images. We find that feature extraction protocols (selection and combination of blocks of CNN layers) impact identification accuracy much more than the classifying mechanism (support vector machine and deep neural network classifiers). While our network has much lower accuracy on photographs of live insects (62%), overall results confirm that a pretrained CNN can be straightforwardly adapted to collection-based images for a new taxonomic group and successfully extract relevant features to classify insect species.

查看原文本刊更多论文

预训练卷积神经网络在一个具有挑战性的测试用例中表现良好:使用少量训练图像识别植物昆虫(半翅目:Miridae)

摘要昆虫自动识别系统已经探索了20多年，但直到最近才开始利用强大而通用的卷积神经网络（CNNs）。虽然典型的CNN应用程序仍然需要每个分类单元有数百个图像的大型训练图像数据集，但最近已经证明，预训练的CNN在更小的数据集上训练时是高度准确的。在这里，我们评估了基于CNN的机器学习方法在识别三个精心策划的物种级植物昆虫Miridae的背侧栖息地数据集方面的性能。Miridae具有重要的经济意义，但物种层面的鉴定具有挑战性，通常依赖于背侧栖息地以外的信息（如寄主植物、位置、生殖器结构）。每个数据集包含2-6个物种和126-246张图像，最困难的数据集平均每个物种只有32张图像。我们发现，仅根据它们的背部习性，就可以识别出亲缘关系密切的植物昆虫物种，准确率为80-90%。经过预训练的CNN的表现比能够获得相同背栖息地图像的分类单元专家好10-20%。我们发现，特征提取协议（CNN层块的选择和组合）对识别精度的影响远大于分类机制（支持向量机和深度神经网络分类器）。虽然我们的网络对活昆虫照片的准确率要低得多（62%），但总体结果证实，预训练的CNN可以直接适应新分类群的基于采集的图像，并成功提取相关特征来对昆虫物种进行分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Insect Systematics and Diversity ENTOMOLOGY-

CiteScore

5.30

自引率

8.80%

发文量