Enhancing date fruit classification using machine learning, CTGAN, and SHAP-based explainability

IF 3.3 3区 农林科学 Q2 FOOD SCIENCE & TECHNOLOGY
Prokash Gogoi, J. Arul Valan
{"title":"Enhancing date fruit classification using machine learning, CTGAN, and SHAP-based explainability","authors":"Prokash Gogoi,&nbsp;J. Arul Valan","doi":"10.1007/s11694-025-03428-x","DOIUrl":null,"url":null,"abstract":"<div><p>The date palm (Phoenix dactylifera) plays a vital role in the cultural and economic sectors of the Middle East, Asia, and North Africa. Accurate and efficient classification of date fruit varieties is crucial, yet traditional manual sorting methods remain inefficient and error-prone. This study aims to develop an automated classification system using machine learning classifiers, address data imbalances with generative AI techniques, and enhance model interpretability through Explainable AI methods. The dataset used in this study contains 898 samples, each comprising 34 features extracted from images of seven different types of date fruits. The Generative AI technique, Conditional Tabular Generative Adversarial Network (CTGAN), was utilized to address class imbalance issues by generating synthetic data. Four classifiers, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), were trained on balanced training data and tested on separate test data. Additionally, the Explainable AI method, SHapley Additive exPlanations (SHAP), was employed to interpret feature importance, enhancing transparency and understanding of the model’s decision-making. Results indicated that the MLP classifier achieved the highest accuracy of 96.11%, with Precision, Recall, and F1 Score values of 94.80%, 95.38%, and 95.02%, respectively. Our approach combining generative and explainable AI techniques significantly improves the efficiency and reliability of date fruit classification systems, offering a scalable and interpretable solution for automating grading and quality assessment. This research provides a practical, data-driven framework for agricultural applications, with potential extensions to real-time data integration and hybrid models for further performance enhancement.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"19 9","pages":"6851 - 6872"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-025-03428-x","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The date palm (Phoenix dactylifera) plays a vital role in the cultural and economic sectors of the Middle East, Asia, and North Africa. Accurate and efficient classification of date fruit varieties is crucial, yet traditional manual sorting methods remain inefficient and error-prone. This study aims to develop an automated classification system using machine learning classifiers, address data imbalances with generative AI techniques, and enhance model interpretability through Explainable AI methods. The dataset used in this study contains 898 samples, each comprising 34 features extracted from images of seven different types of date fruits. The Generative AI technique, Conditional Tabular Generative Adversarial Network (CTGAN), was utilized to address class imbalance issues by generating synthetic data. Four classifiers, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), were trained on balanced training data and tested on separate test data. Additionally, the Explainable AI method, SHapley Additive exPlanations (SHAP), was employed to interpret feature importance, enhancing transparency and understanding of the model’s decision-making. Results indicated that the MLP classifier achieved the highest accuracy of 96.11%, with Precision, Recall, and F1 Score values of 94.80%, 95.38%, and 95.02%, respectively. Our approach combining generative and explainable AI techniques significantly improves the efficiency and reliability of date fruit classification systems, offering a scalable and interpretable solution for automating grading and quality assessment. This research provides a practical, data-driven framework for agricultural applications, with potential extensions to real-time data integration and hybrid models for further performance enhancement.

利用机器学习、CTGAN和基于shap的可解释性增强枣果分类
枣椰树(Phoenix dactylifera)在中东、亚洲和北非的文化和经济领域起着至关重要的作用。准确、高效的枣果品种分类至关重要,但传统的人工分类方法效率低下,容易出错。本研究旨在利用机器学习分类器开发一个自动分类系统,利用生成式人工智能技术解决数据失衡问题,并通过可解释的人工智能方法增强模型的可解释性。本研究使用的数据集包含898个样本,每个样本包含从7种不同类型的枣果实图像中提取的34个特征。生成式人工智能技术,条件表格生成对抗网络(CTGAN),被用来通过生成合成数据来解决职业不平衡问题。XGBoost、支持向量机(SVM)、k近邻(KNN)和多层感知器(MLP)四个分类器在平衡训练数据上进行训练,并在单独的测试数据上进行测试。此外,可解释的人工智能方法SHapley加性解释(SHAP)被用来解释特征的重要性,提高透明度和对模型决策的理解。结果表明,MLP分类器准确率最高,达到96.11%,其中Precision、Recall和F1 Score值分别为94.80%、95.38%和95.02%。我们的方法结合了生成和可解释的人工智能技术,显着提高了枣果分类系统的效率和可靠性,为自动化分级和质量评估提供了可扩展和可解释的解决方案。这项研究为农业应用提供了一个实用的、数据驱动的框架,并有可能扩展到实时数据集成和混合模型,以进一步提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Food Measurement and Characterization
Journal of Food Measurement and Characterization Agricultural and Biological Sciences-Food Science
CiteScore
6.00
自引率
11.80%
发文量
425
期刊介绍: This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance. The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信