{"title":"Instance segmentation of oyster mushroom datasets: A novel data sampling methodology for training and evaluation of deep learning models","authors":"Christos Charisis, Meiqing Wang, Dimitrios Argyropoulos","doi":"10.1016/j.atech.2025.101146","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a novel data sampling methodology for training and evaluation of deep-learning instance segmentation models using a comprehensive image dataset of oyster mushroom clusters obtained from commercial farms including 25,978 single mushrooms. A custom data splitting and reduction strategy was designed to generate multiple training subsets for an in-depth model performance evaluation. Also, the study aims to examine the ability of five feature extraction backbone configurations of Mask R-CNN: i) CNN-based (ResNet50, ResNeXt101 and ConvNeXt) and ii) Transformer-based (Swin small and tiny) to accurately detect and segment single mushroom instances within the cluster in the images. To complement the standard evaluation metrics (mAP, mAR), two new metrics, namely Correctness and Instance Segmentation Quality Index (ISQI), were introduced. Correctness was used to assess the segmentation quality and ISQI to combine information from both detection (mAR) and segmentation (Correctness). The new metrics examined the consistency of the generated masks across multiple experimental runs on distinct dataset splits, reflecting the ability of the models to produce similar masks despite variations in their training data. The results revealed that ConvNeXt consistently outperformed its counterparts (mAP = 0.7675, mAR = 0.8071; Correctness = 0.9160, ISQI = 0.8598) in all dataset sizes, demonstrating superior detection ability, even in cases of high occlusion and low illumination. Swin also exhibited high detection performance (mAP = 0.7616, mAR = 0.7991; Correctness = 0.9126, ISQI = 0.8540), however with a greater dependence on the dataset size. Overall, this research highlights the importance of properly evaluating backbone architectures across different dataset sizes for developing robust DL instance segmentation models applicable to mushroom farming or other visually complex environments.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"12 ","pages":"Article 101146"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525003788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a novel data sampling methodology for training and evaluation of deep-learning instance segmentation models using a comprehensive image dataset of oyster mushroom clusters obtained from commercial farms including 25,978 single mushrooms. A custom data splitting and reduction strategy was designed to generate multiple training subsets for an in-depth model performance evaluation. Also, the study aims to examine the ability of five feature extraction backbone configurations of Mask R-CNN: i) CNN-based (ResNet50, ResNeXt101 and ConvNeXt) and ii) Transformer-based (Swin small and tiny) to accurately detect and segment single mushroom instances within the cluster in the images. To complement the standard evaluation metrics (mAP, mAR), two new metrics, namely Correctness and Instance Segmentation Quality Index (ISQI), were introduced. Correctness was used to assess the segmentation quality and ISQI to combine information from both detection (mAR) and segmentation (Correctness). The new metrics examined the consistency of the generated masks across multiple experimental runs on distinct dataset splits, reflecting the ability of the models to produce similar masks despite variations in their training data. The results revealed that ConvNeXt consistently outperformed its counterparts (mAP = 0.7675, mAR = 0.8071; Correctness = 0.9160, ISQI = 0.8598) in all dataset sizes, demonstrating superior detection ability, even in cases of high occlusion and low illumination. Swin also exhibited high detection performance (mAP = 0.7616, mAR = 0.7991; Correctness = 0.9126, ISQI = 0.8540), however with a greater dependence on the dataset size. Overall, this research highlights the importance of properly evaluating backbone architectures across different dataset sizes for developing robust DL instance segmentation models applicable to mushroom farming or other visually complex environments.