基于LIME和shap可解释性的假新闻检测自适应集成分类器

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-04-15 DOI:10.1016/j.eswa.2025.127751

Ashima Kukkar, Gagandeep Kaur

{"title":"基于LIME和shap可解释性的假新闻检测自适应集成分类器","authors":"Ashima Kukkar, Gagandeep Kaur","doi":"10.1016/j.eswa.2025.127751","DOIUrl":null,"url":null,"abstract":"<div><div>The constant availability of fake news on social media and other information-sharing platforms has raised a demand for efficient and effective fake news detection models. Current solutions propose the principle of static feature extraction, the dependence on a single classifier, or modest results that do not meet the requirements in today’s environment with its complexity of data. As a result of these, this study presents the Adaptive Ensemble Classifier (AEC), a new ensemble system that consists of hybrid decision trees and Support Vector Machine (SVM) similar to margin optimization for the improvement of classification performance. The proposed AEC incorporates several innovative features: dynamic feature selection through adaptive neighbourhood selection based on feature importance, SVM-based refinement of decision boundaries for improved precision, and a weighted ensemble voting mechanism to ensure robust predictions. In addition, to ensuring explain ability, the system uses LIME and SHAP to provide probability-based explanations for the predictions and the features that influence the results. The performance of the AEC is evaluated using public datasets such as the Fake News dataset and cross-domain performance using COVID-19 Fake News Dataset. Experimental results confirmed that the proposed model achieved an impressive accuracy of 99.74% compared to traditional Machine Learning (ML) and Deep Learning (DL) models in particular in aspects of accuracy, precision, recall, and F1 score. The computational efficiency is also evaluated by comparing training time, memory usage, peak memory usage, inference time, and model size with existing models. The interpretability offered by LIME focuses on the most important features that affect predictions, which makes the system more useful in real-world situations. Finally, the different statistical analysis tests are also employed on proposed AEC and existing ML and DL models such as Paired <em>t</em>-test, Kruskal-Wallis, Dunn’s, Bootstrap Resampling, Cohen’s d Effect Size and Confidence Intervals. The results showed that significant performance difference between proposed AEC and other models.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"281 ","pages":"Article 127751"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AEC: A novel adaptive ensemble classifier with LIME and SHAP-Based interpretability for fake news detection\",\"authors\":\"Ashima Kukkar, Gagandeep Kaur\",\"doi\":\"10.1016/j.eswa.2025.127751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The constant availability of fake news on social media and other information-sharing platforms has raised a demand for efficient and effective fake news detection models. Current solutions propose the principle of static feature extraction, the dependence on a single classifier, or modest results that do not meet the requirements in today’s environment with its complexity of data. As a result of these, this study presents the Adaptive Ensemble Classifier (AEC), a new ensemble system that consists of hybrid decision trees and Support Vector Machine (SVM) similar to margin optimization for the improvement of classification performance. The proposed AEC incorporates several innovative features: dynamic feature selection through adaptive neighbourhood selection based on feature importance, SVM-based refinement of decision boundaries for improved precision, and a weighted ensemble voting mechanism to ensure robust predictions. In addition, to ensuring explain ability, the system uses LIME and SHAP to provide probability-based explanations for the predictions and the features that influence the results. The performance of the AEC is evaluated using public datasets such as the Fake News dataset and cross-domain performance using COVID-19 Fake News Dataset. Experimental results confirmed that the proposed model achieved an impressive accuracy of 99.74% compared to traditional Machine Learning (ML) and Deep Learning (DL) models in particular in aspects of accuracy, precision, recall, and F1 score. The computational efficiency is also evaluated by comparing training time, memory usage, peak memory usage, inference time, and model size with existing models. The interpretability offered by LIME focuses on the most important features that affect predictions, which makes the system more useful in real-world situations. Finally, the different statistical analysis tests are also employed on proposed AEC and existing ML and DL models such as Paired <em>t</em>-test, Kruskal-Wallis, Dunn’s, Bootstrap Resampling, Cohen’s d Effect Size and Confidence Intervals. The results showed that significant performance difference between proposed AEC and other models.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"281 \",\"pages\":\"Article 127751\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425013739\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425013739","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

社交媒体和其他信息分享平台上的假新闻不断涌现，对高效、有效的假新闻检测模型提出了需求。目前的解决方案提出了静态特征提取的原则，依赖于单一分类器，或者由于数据的复杂性，结果不符合当今环境的要求。基于此，本研究提出了自适应集成分类器（AEC），这是一种新的集成系统，由混合决策树和支持向量机（SVM）组成，类似于边际优化，以提高分类性能。提出的AEC包含几个创新特征：基于特征重要性的自适应邻域选择的动态特征选择，基于支持向量机的决策边界改进以提高精度，以及加权集成投票机制以确保预测的鲁棒性。此外，为了保证解释能力，系统使用LIME和SHAP对预测和影响结果的特征提供基于概率的解释。使用公共数据集（如假新闻数据集）和使用COVID-19假新闻数据集的跨域性能来评估AEC的性能。实验结果证实，与传统的机器学习（ML）和深度学习（DL）模型相比，所提出的模型在准确性、精密度、召回率和F1分数方面取得了令人印象深刻的99.74%的准确率。通过与现有模型比较训练时间、内存使用、峰值内存使用、推理时间和模型大小来评估计算效率。LIME提供的可解释性侧重于影响预测的最重要特征，这使得该系统在现实世界的情况下更有用。最后，对提出的AEC和现有的ML和DL模型，如配对t检验、Kruskal-Wallis、Dunn’s、Bootstrap Resampling、Cohen’s d效应大小和置信区间，也采用了不同的统计分析检验。结果表明，本文提出的AEC模型与其他模型的性能差异显著。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AEC: A novel adaptive ensemble classifier with LIME and SHAP-Based interpretability for fake news detection

The constant availability of fake news on social media and other information-sharing platforms has raised a demand for efficient and effective fake news detection models. Current solutions propose the principle of static feature extraction, the dependence on a single classifier, or modest results that do not meet the requirements in today’s environment with its complexity of data. As a result of these, this study presents the Adaptive Ensemble Classifier (AEC), a new ensemble system that consists of hybrid decision trees and Support Vector Machine (SVM) similar to margin optimization for the improvement of classification performance. The proposed AEC incorporates several innovative features: dynamic feature selection through adaptive neighbourhood selection based on feature importance, SVM-based refinement of decision boundaries for improved precision, and a weighted ensemble voting mechanism to ensure robust predictions. In addition, to ensuring explain ability, the system uses LIME and SHAP to provide probability-based explanations for the predictions and the features that influence the results. The performance of the AEC is evaluated using public datasets such as the Fake News dataset and cross-domain performance using COVID-19 Fake News Dataset. Experimental results confirmed that the proposed model achieved an impressive accuracy of 99.74% compared to traditional Machine Learning (ML) and Deep Learning (DL) models in particular in aspects of accuracy, precision, recall, and F1 score. The computational efficiency is also evaluated by comparing training time, memory usage, peak memory usage, inference time, and model size with existing models. The interpretability offered by LIME focuses on the most important features that affect predictions, which makes the system more useful in real-world situations. Finally, the different statistical analysis tests are also employed on proposed AEC and existing ML and DL models such as Paired t-test, Kruskal-Wallis, Dunn’s, Bootstrap Resampling, Cohen’s d Effect Size and Confidence Intervals. The results showed that significant performance difference between proposed AEC and other models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.