{"title":"使用无监督可解释的人工智能弥合数字取证方面的知识差距","authors":"Zainab Khalid , Farkhund Iqbal , Mohd Saqib","doi":"10.1016/j.fsidi.2025.301924","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial Intelligence (AI) has found multi-faceted applications in critical sectors including Digital Forensics (DF) which also require eXplainability (XAI) as a non-negotiable for its applicability, such as admissibility of expert evidence in the court of law. The state-of-the-art XAI workflows focus more on utilizing XAI tools for supervised learning. This is in contrast to the fact that unsupervised learning may be practically more relevant in DF and other sectors that largely produce complex and unlabeled data continuously, in considerable volumes. This research study explores the challenges and utility of unsupervised learning-based XAI for DF's complex datasets. A memory forensics-based case scenario is implemented to detect anomalies and cluster obfuscated malware using the Isolation Forest, Autoencoder, K-means, DBSCAN, and Gaussian Mixture Model (GMM) unsupervised algorithms on three categorical levels. The CIC MalMemAnalysis-2022 dataset's binary, and multivariate (4, 16) categories are used as a reference to perform clustering. The anomaly detection and clustering results are evaluated using accuracy, confusion matrices and Adjusted Rand Index (ARI) and explained through Shapley Additive Explanations (SHAP), using force, waterfall, scatter, summary, and bar plots' local and global explanations. We also explore how some SHAP explanations may be used for dimensionality reduction.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301924"},"PeriodicalIF":2.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging knowledge gaps in digital forensics using unsupervised explainable AI\",\"authors\":\"Zainab Khalid , Farkhund Iqbal , Mohd Saqib\",\"doi\":\"10.1016/j.fsidi.2025.301924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Artificial Intelligence (AI) has found multi-faceted applications in critical sectors including Digital Forensics (DF) which also require eXplainability (XAI) as a non-negotiable for its applicability, such as admissibility of expert evidence in the court of law. The state-of-the-art XAI workflows focus more on utilizing XAI tools for supervised learning. This is in contrast to the fact that unsupervised learning may be practically more relevant in DF and other sectors that largely produce complex and unlabeled data continuously, in considerable volumes. This research study explores the challenges and utility of unsupervised learning-based XAI for DF's complex datasets. A memory forensics-based case scenario is implemented to detect anomalies and cluster obfuscated malware using the Isolation Forest, Autoencoder, K-means, DBSCAN, and Gaussian Mixture Model (GMM) unsupervised algorithms on three categorical levels. The CIC MalMemAnalysis-2022 dataset's binary, and multivariate (4, 16) categories are used as a reference to perform clustering. The anomaly detection and clustering results are evaluated using accuracy, confusion matrices and Adjusted Rand Index (ARI) and explained through Shapley Additive Explanations (SHAP), using force, waterfall, scatter, summary, and bar plots' local and global explanations. We also explore how some SHAP explanations may be used for dimensionality reduction.</div></div>\",\"PeriodicalId\":48481,\"journal\":{\"name\":\"Forensic Science International-Digital Investigation\",\"volume\":\"53 \",\"pages\":\"Article 301924\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Digital Investigation\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666281725000630\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281725000630","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Bridging knowledge gaps in digital forensics using unsupervised explainable AI
Artificial Intelligence (AI) has found multi-faceted applications in critical sectors including Digital Forensics (DF) which also require eXplainability (XAI) as a non-negotiable for its applicability, such as admissibility of expert evidence in the court of law. The state-of-the-art XAI workflows focus more on utilizing XAI tools for supervised learning. This is in contrast to the fact that unsupervised learning may be practically more relevant in DF and other sectors that largely produce complex and unlabeled data continuously, in considerable volumes. This research study explores the challenges and utility of unsupervised learning-based XAI for DF's complex datasets. A memory forensics-based case scenario is implemented to detect anomalies and cluster obfuscated malware using the Isolation Forest, Autoencoder, K-means, DBSCAN, and Gaussian Mixture Model (GMM) unsupervised algorithms on three categorical levels. The CIC MalMemAnalysis-2022 dataset's binary, and multivariate (4, 16) categories are used as a reference to perform clustering. The anomaly detection and clustering results are evaluated using accuracy, confusion matrices and Adjusted Rand Index (ARI) and explained through Shapley Additive Explanations (SHAP), using force, waterfall, scatter, summary, and bar plots' local and global explanations. We also explore how some SHAP explanations may be used for dimensionality reduction.