{"title":"CNN-based damage classification of soybean kernels using a high-magnification image dataset","authors":"Isparsh Chauhan, Siddharth Kekre, Ankur Miglani, Pavan Kumar Kankar, Milind B. Ratnaparkhe","doi":"10.1007/s11694-025-03195-9","DOIUrl":null,"url":null,"abstract":"<div><p>The assessment of the surface quality of pre-processed soybean kernels is crucial in determining their market acceptance, storage stability, processing quality, and overall consumer approval. Conventional techniques of surface quality evaluation are time-consuming, reliant on personal judgement, and lack consistency. Conversely, the existing techniques are restricted to either selecting healthy soybean kernels from damaged ones without categorizing the damaged ones, or separating different varieties. The lack of a labelled, high-magnification image dataset and the use of advanced CNN models have hindered the exploration of a detailed classification of damage in soybean kernels. These models excel at end-to-end tasks, minimize pre-processing, and eliminate the need for manual feature extraction, enabling quick, accurate, and precise classification. This study demonstrates the use of a machine vision system to create an image dataset consisting of 9866 high-magnification (2.85 µm/pixel) images of soybean kernels with damages. The dataset encompasses eight distinct damage classes: healthy, heat damage (HD), immature damage (IMD), mold damage (MD), purple mottled and stained (PMS), stinkbug damage (SBD), shriveled/wrinkle damage (SWD), and tear damage (TAD). Due to on-field collection a high degree of imbalance was encountered among the damage classes with healthy being the top-class accounting for the 41% of the total dataset while SBD and PMS being the classes with least number of images; accounting for just 5% of total dataset. Secondly, three advanced memory-efficient Deep-CNN models, namely, EfficientNet-B0, ResNet- 50, and VGG- 16, are utilized and fine-tunned to classify damaged soybean kernels. Results from experiments demonstrate that the EfficientNet-B0 model outperforms others in terms of accuracy, average recall, and F1-score and second best in terms of precision. The individual class accuracy achieved is as follows: 77% for HD class, 92% for healthy class, 78% for IMD class, 77% for MD class, 84% for PMS class, 72% for SBD class, 75% for SWD class and 92% for TAD class. In addition, the performance of model in handling of class imbalance among the eight damage classes is also analyzed by comparing the F1-score. Five out of eight classes achieved a F1-score of above 80% including the PMS. The class having the least F1-score was SBD with a score of 68%. The EfficientNet-B0 model attains an overall classification accuracy of 85% with a nominal size of 47 MB. It also has a minimum prediction time of under 9 s while predicting 1480 data points simultaneously. In summary, this study shows that using Deep CNN architectures on a high-magnified and highly unbalanced complex image dataset can accurately classify damaged soybean kernels. The model also performs well in handling data imbalance, making it a useful tool for objective quality assessment of damaged soybean grains in market and trading locations.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"19 5","pages":"3471 - 3495"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-025-03195-9","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The assessment of the surface quality of pre-processed soybean kernels is crucial in determining their market acceptance, storage stability, processing quality, and overall consumer approval. Conventional techniques of surface quality evaluation are time-consuming, reliant on personal judgement, and lack consistency. Conversely, the existing techniques are restricted to either selecting healthy soybean kernels from damaged ones without categorizing the damaged ones, or separating different varieties. The lack of a labelled, high-magnification image dataset and the use of advanced CNN models have hindered the exploration of a detailed classification of damage in soybean kernels. These models excel at end-to-end tasks, minimize pre-processing, and eliminate the need for manual feature extraction, enabling quick, accurate, and precise classification. This study demonstrates the use of a machine vision system to create an image dataset consisting of 9866 high-magnification (2.85 µm/pixel) images of soybean kernels with damages. The dataset encompasses eight distinct damage classes: healthy, heat damage (HD), immature damage (IMD), mold damage (MD), purple mottled and stained (PMS), stinkbug damage (SBD), shriveled/wrinkle damage (SWD), and tear damage (TAD). Due to on-field collection a high degree of imbalance was encountered among the damage classes with healthy being the top-class accounting for the 41% of the total dataset while SBD and PMS being the classes with least number of images; accounting for just 5% of total dataset. Secondly, three advanced memory-efficient Deep-CNN models, namely, EfficientNet-B0, ResNet- 50, and VGG- 16, are utilized and fine-tunned to classify damaged soybean kernels. Results from experiments demonstrate that the EfficientNet-B0 model outperforms others in terms of accuracy, average recall, and F1-score and second best in terms of precision. The individual class accuracy achieved is as follows: 77% for HD class, 92% for healthy class, 78% for IMD class, 77% for MD class, 84% for PMS class, 72% for SBD class, 75% for SWD class and 92% for TAD class. In addition, the performance of model in handling of class imbalance among the eight damage classes is also analyzed by comparing the F1-score. Five out of eight classes achieved a F1-score of above 80% including the PMS. The class having the least F1-score was SBD with a score of 68%. The EfficientNet-B0 model attains an overall classification accuracy of 85% with a nominal size of 47 MB. It also has a minimum prediction time of under 9 s while predicting 1480 data points simultaneously. In summary, this study shows that using Deep CNN architectures on a high-magnified and highly unbalanced complex image dataset can accurately classify damaged soybean kernels. The model also performs well in handling data imbalance, making it a useful tool for objective quality assessment of damaged soybean grains in market and trading locations.
期刊介绍:
This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance.
The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.