{"title":"GradCAM-PestDetNet: A deep learning-based hybrid model with explainable AI for pest detection and classification","authors":"Ramitha Vimala , Saharsh Mehrotra , Satish Kumar , Pooja Kamat , Arunkumar Bongale , Ketan Kotecha","doi":"10.1016/j.mex.2025.103533","DOIUrl":null,"url":null,"abstract":"<div><div>Pest detection is crucial for both agriculture and ecology. The growing global population demands an efficient pest detection system to ensure food security. Pests threaten agricultural productivity, sustainability, and economic development. They also cause damage to machinery, equipment and soil, making effective detection essential for commercial benefits. Traditional pest detection methods are often slow, less accurate and reliant on expert knowledge. With advancements in computer vision and AI, deep transfer learning models (DTLMs) have emerged as powerful solutions. The GradCAM-PestDetNet methodology utilizes object detection models like YOLOv8m, YOLOv8s and YOLOv8n, alongside transfer learning techniques such as VGG16, ResNet50, EfficientNetB0, MobileNetV2, InceptionV3 and DenseNet121 for feature extraction. Additionally, Vision Transformers (ViT) and Swim Transformers were explored for their ability to process complex data patterns. To enhance model interpretability, GradCAM-PestDetNet integrates Gradient-weighted Class Activation Mapping (Grad-CAM), allowing better visualization of model predictions.<ul><li><span>•</span><span><div>Uses YOLOv8 models (YOLOv8n for fastest inference at 1.86 ms/img) and transfer learning for pest detection ensuring that the system is viable for low-resource environments.</div></span></li><li><span>•</span><span><div>Employs an ensemble model (ResNet50, DenseNet, MobileNet) that achieved 67.07 % accuracy, 66.3 % F1-score and 68.1 % recall. This is an improvement over the baseline CNN which gave an accuracy of 21.5 %. This ensures a more generalized and robust model that is not biased towards the majority class.</div></span></li><li><span>•</span><span><div>Integrates Grad-CAM for improved interpretability in pest detection.</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103533"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125003772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Pest detection is crucial for both agriculture and ecology. The growing global population demands an efficient pest detection system to ensure food security. Pests threaten agricultural productivity, sustainability, and economic development. They also cause damage to machinery, equipment and soil, making effective detection essential for commercial benefits. Traditional pest detection methods are often slow, less accurate and reliant on expert knowledge. With advancements in computer vision and AI, deep transfer learning models (DTLMs) have emerged as powerful solutions. The GradCAM-PestDetNet methodology utilizes object detection models like YOLOv8m, YOLOv8s and YOLOv8n, alongside transfer learning techniques such as VGG16, ResNet50, EfficientNetB0, MobileNetV2, InceptionV3 and DenseNet121 for feature extraction. Additionally, Vision Transformers (ViT) and Swim Transformers were explored for their ability to process complex data patterns. To enhance model interpretability, GradCAM-PestDetNet integrates Gradient-weighted Class Activation Mapping (Grad-CAM), allowing better visualization of model predictions.
•
Uses YOLOv8 models (YOLOv8n for fastest inference at 1.86 ms/img) and transfer learning for pest detection ensuring that the system is viable for low-resource environments.
•
Employs an ensemble model (ResNet50, DenseNet, MobileNet) that achieved 67.07 % accuracy, 66.3 % F1-score and 68.1 % recall. This is an improvement over the baseline CNN which gave an accuracy of 21.5 %. This ensures a more generalized and robust model that is not biased towards the majority class.
•
Integrates Grad-CAM for improved interpretability in pest detection.