{"title":"利用机器学习、CTGAN和基于shap的可解释性增强枣果分类","authors":"Prokash Gogoi, J. Arul Valan","doi":"10.1007/s11694-025-03428-x","DOIUrl":null,"url":null,"abstract":"<div><p>The date palm (Phoenix dactylifera) plays a vital role in the cultural and economic sectors of the Middle East, Asia, and North Africa. Accurate and efficient classification of date fruit varieties is crucial, yet traditional manual sorting methods remain inefficient and error-prone. This study aims to develop an automated classification system using machine learning classifiers, address data imbalances with generative AI techniques, and enhance model interpretability through Explainable AI methods. The dataset used in this study contains 898 samples, each comprising 34 features extracted from images of seven different types of date fruits. The Generative AI technique, Conditional Tabular Generative Adversarial Network (CTGAN), was utilized to address class imbalance issues by generating synthetic data. Four classifiers, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), were trained on balanced training data and tested on separate test data. Additionally, the Explainable AI method, SHapley Additive exPlanations (SHAP), was employed to interpret feature importance, enhancing transparency and understanding of the model’s decision-making. Results indicated that the MLP classifier achieved the highest accuracy of 96.11%, with Precision, Recall, and F1 Score values of 94.80%, 95.38%, and 95.02%, respectively. Our approach combining generative and explainable AI techniques significantly improves the efficiency and reliability of date fruit classification systems, offering a scalable and interpretable solution for automating grading and quality assessment. This research provides a practical, data-driven framework for agricultural applications, with potential extensions to real-time data integration and hybrid models for further performance enhancement.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"19 9","pages":"6851 - 6872"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing date fruit classification using machine learning, CTGAN, and SHAP-based explainability\",\"authors\":\"Prokash Gogoi, J. Arul Valan\",\"doi\":\"10.1007/s11694-025-03428-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The date palm (Phoenix dactylifera) plays a vital role in the cultural and economic sectors of the Middle East, Asia, and North Africa. Accurate and efficient classification of date fruit varieties is crucial, yet traditional manual sorting methods remain inefficient and error-prone. This study aims to develop an automated classification system using machine learning classifiers, address data imbalances with generative AI techniques, and enhance model interpretability through Explainable AI methods. The dataset used in this study contains 898 samples, each comprising 34 features extracted from images of seven different types of date fruits. The Generative AI technique, Conditional Tabular Generative Adversarial Network (CTGAN), was utilized to address class imbalance issues by generating synthetic data. Four classifiers, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), were trained on balanced training data and tested on separate test data. Additionally, the Explainable AI method, SHapley Additive exPlanations (SHAP), was employed to interpret feature importance, enhancing transparency and understanding of the model’s decision-making. Results indicated that the MLP classifier achieved the highest accuracy of 96.11%, with Precision, Recall, and F1 Score values of 94.80%, 95.38%, and 95.02%, respectively. Our approach combining generative and explainable AI techniques significantly improves the efficiency and reliability of date fruit classification systems, offering a scalable and interpretable solution for automating grading and quality assessment. This research provides a practical, data-driven framework for agricultural applications, with potential extensions to real-time data integration and hybrid models for further performance enhancement.</p></div>\",\"PeriodicalId\":631,\"journal\":{\"name\":\"Journal of Food Measurement and Characterization\",\"volume\":\"19 9\",\"pages\":\"6851 - 6872\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Food Measurement and Characterization\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11694-025-03428-x\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"FOOD SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-025-03428-x","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Enhancing date fruit classification using machine learning, CTGAN, and SHAP-based explainability
The date palm (Phoenix dactylifera) plays a vital role in the cultural and economic sectors of the Middle East, Asia, and North Africa. Accurate and efficient classification of date fruit varieties is crucial, yet traditional manual sorting methods remain inefficient and error-prone. This study aims to develop an automated classification system using machine learning classifiers, address data imbalances with generative AI techniques, and enhance model interpretability through Explainable AI methods. The dataset used in this study contains 898 samples, each comprising 34 features extracted from images of seven different types of date fruits. The Generative AI technique, Conditional Tabular Generative Adversarial Network (CTGAN), was utilized to address class imbalance issues by generating synthetic data. Four classifiers, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), were trained on balanced training data and tested on separate test data. Additionally, the Explainable AI method, SHapley Additive exPlanations (SHAP), was employed to interpret feature importance, enhancing transparency and understanding of the model’s decision-making. Results indicated that the MLP classifier achieved the highest accuracy of 96.11%, with Precision, Recall, and F1 Score values of 94.80%, 95.38%, and 95.02%, respectively. Our approach combining generative and explainable AI techniques significantly improves the efficiency and reliability of date fruit classification systems, offering a scalable and interpretable solution for automating grading and quality assessment. This research provides a practical, data-driven framework for agricultural applications, with potential extensions to real-time data integration and hybrid models for further performance enhancement.
期刊介绍:
This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance.
The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.