Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments

IF 8.2 Q1 AGRICULTURE, MULTIDISCIPLINARY

Artificial Intelligence in Agriculture Pub Date : 2024-07-16 DOI:10.1016/j.aiia.2024.07.001

Ranjan Sapkota, Dawood Ahmed, Manoj Karkee

{"title":"Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments","authors":"Ranjan Sapkota, Dawood Ahmed, Manoj Karkee","doi":"10.1016/j.aiia.2024.07.001","DOIUrl":null,"url":null,"abstract":"<div><p>Instance segmentation, an important image processing operation for automation in agriculture, is used to precisely delineate individual objects of interest within images, which provides foundational information for various automated or robotic tasks such as selective harvesting and precision pruning. This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two datasets. Dataset 1, collected in dormant season, includes images of dormant apple trees, which were used to train multi-object segmentation models delineating tree branches and trunks. Dataset 2, collected in the early growing season, includes images of apple tree canopies with green foliage and immature (green) apples (also called fruitlet), which were used to train single-object segmentation models delineating only immature green apples. The results showed that YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5. Specifically, for Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all classes. In comparison, Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the same dataset. With Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97. Mask R-CNN, in this single-class scenario, achieved a precision of 0.85 and a recall of 0.88. Additionally, the inference times for YOLOv8 were 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's, respectively. These findings show YOLOv8's superior accuracy and efficiency in machine learning applications compared to two-stage models, specifically Mask-R-CNN, which suggests its suitability in developing smart and automated orchard operations, particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thinning.</p></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"13 ","pages":"Pages 84-99"},"PeriodicalIF":8.2000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S258972172400028X/pdfft?md5=d0b3ae6930c8dca43a65b49ca13f6d47&pid=1-s2.0-S258972172400028X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S258972172400028X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Instance segmentation, an important image processing operation for automation in agriculture, is used to precisely delineate individual objects of interest within images, which provides foundational information for various automated or robotic tasks such as selective harvesting and precision pruning. This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two datasets. Dataset 1, collected in dormant season, includes images of dormant apple trees, which were used to train multi-object segmentation models delineating tree branches and trunks. Dataset 2, collected in the early growing season, includes images of apple tree canopies with green foliage and immature (green) apples (also called fruitlet), which were used to train single-object segmentation models delineating only immature green apples. The results showed that YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5. Specifically, for Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all classes. In comparison, Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the same dataset. With Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97. Mask R-CNN, in this single-class scenario, achieved a precision of 0.85 and a recall of 0.88. Additionally, the inference times for YOLOv8 were 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's, respectively. These findings show YOLOv8's superior accuracy and efficiency in machine learning applications compared to two-stage models, specifically Mask-R-CNN, which suggests its suitability in developing smart and automated orchard operations, particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thinning.

查看原文本刊更多论文

比较 YOLOv8 和 Mask R-CNN 在复杂果园环境中的实例分割功能

实例分割是农业自动化中一项重要的图像处理操作，用于精确划分图像中感兴趣的单个对象，为选择性收获和精确修剪等各种自动化或机器人任务提供基础信息。本研究通过两个数据集，比较了用于不同果园条件下实例分割的单级 YOLOv8 和两级 Mask R-CNN 机器学习模型。数据集 1 收集的是休眠期的苹果树图像，用于训练划分树枝和树干的多目标分割模型。数据集 2 收集于早期生长季节，包括苹果树树冠上的绿叶和未成熟（绿色）苹果（也称为小果）的图像，用于训练仅划分未成熟绿色苹果的单目标分割模型。结果显示，YOLOv8 的表现优于 Mask R-CNN，在置信度为 0.5 的阈值下，YOLOv8 在两个数据集上都取得了良好的精确度和接近完美的召回率。具体来说，在数据集 1 中，YOLOv8 对所有类别的精确度都达到了 0.90，召回率达到了 0.95。相比之下，Mask R-CNN 在同一数据集上的精确度为 0.81，召回率为 0.81。在数据集 2 中，YOLOv8 的精确度为 0.93，召回率为 0.97。Mask R-CNN 在单类情况下的精确度为 0.85，召回率为 0.88。此外，YOLOv8 的多类分割推理时间（数据集 1）为 10.9 毫秒，单类分割推理时间（数据集 2）为 7.8 毫秒，而 Mask R-CNN 的推理时间分别为 15.6 毫秒和 12.8 毫秒。这些研究结果表明，在机器学习应用中，YOLOv8 比两级模型（特别是 Mask-R-CNN）具有更高的准确性和效率，这表明它适用于开发智能和自动化果园操作，特别是在机器人收获和机器人疏剪未成熟青果等需要实时应用的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊