SIFT: Sifting file types—application of explainable artificial intelligence in cyber forensics

IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Shahid Alam, Alper Kamil Demir
{"title":"SIFT: Sifting file types—application of explainable artificial intelligence in cyber forensics","authors":"Shahid Alam, Alper Kamil Demir","doi":"10.1186/s42400-024-00241-9","DOIUrl":null,"url":null,"abstract":"<p>Artificial Intelligence (AI) is being applied to improve the efficiency of software systems used in various domains, especially in the health and forensic sciences. Explainable AI (XAI) is one of the fields of AI that interprets and explains the methods used in AI. One of the techniques used in XAI to provide such interpretations is by computing the relevance of the input features to the output of an AI model. File fragment classification is one of the vital issues of file carving in Cyber Forensics (CF) and becomes challenging when the filesystem <i>metadata is missing</i>. Other major challenges it faces are: <i>proliferation of file formats</i>, <i>file embeddings</i>, <i>automation</i>, We leverage and utilize interpretations provided by XAI to optimize the classification of file fragments and propose a novel sifting approach, named SIFT (Sifting File Types). SIFT employs TF-IDF to assign weight to a byte (feature), which is used to select features from a file fragment. Threshold-based LIME and SHAP (the two XAI techniques) feature relevance values are computed for the selected features to optimize file fragment classification. To improve multinomial classification, a Multilayer Perceptron model is developed and optimized with five hidden layers, each layer with <span>\\(i \\times n\\)</span> neurons, where <i>i</i> = the layer number and <i>n</i> = the total number of classes in the dataset. When tested with 47,482 samples of 20 file types (classes), SIFT achieves a detection rate of 82.1% and outperforms the other state-of-the-art techniques by at least 10%. To the best of our knowledge, this is the first effort of applying XAI in CF for optimizing file fragment classification.</p>","PeriodicalId":36402,"journal":{"name":"Cybersecurity","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cybersecurity","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42400-024-00241-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial Intelligence (AI) is being applied to improve the efficiency of software systems used in various domains, especially in the health and forensic sciences. Explainable AI (XAI) is one of the fields of AI that interprets and explains the methods used in AI. One of the techniques used in XAI to provide such interpretations is by computing the relevance of the input features to the output of an AI model. File fragment classification is one of the vital issues of file carving in Cyber Forensics (CF) and becomes challenging when the filesystem metadata is missing. Other major challenges it faces are: proliferation of file formats, file embeddings, automation, We leverage and utilize interpretations provided by XAI to optimize the classification of file fragments and propose a novel sifting approach, named SIFT (Sifting File Types). SIFT employs TF-IDF to assign weight to a byte (feature), which is used to select features from a file fragment. Threshold-based LIME and SHAP (the two XAI techniques) feature relevance values are computed for the selected features to optimize file fragment classification. To improve multinomial classification, a Multilayer Perceptron model is developed and optimized with five hidden layers, each layer with \(i \times n\) neurons, where i = the layer number and n = the total number of classes in the dataset. When tested with 47,482 samples of 20 file types (classes), SIFT achieves a detection rate of 82.1% and outperforms the other state-of-the-art techniques by at least 10%. To the best of our knowledge, this is the first effort of applying XAI in CF for optimizing file fragment classification.

Abstract Image

SIFT:筛选文件类型--可解释人工智能在网络取证中的应用
人工智能(AI)正被用于提高各领域软件系统的效率,尤其是在健康和法医学领域。可解释的人工智能(XAI)是人工智能的一个领域,它对人工智能中使用的方法进行解释和说明。XAI 中用于提供此类解释的技术之一是计算输入特征与人工智能模型输出的相关性。文件片段分类是网络取证(CF)中文件雕刻的重要问题之一,当文件系统元数据缺失时,文件片段分类就变得非常具有挑战性。我们利用 XAI 提供的解释来优化文件片段的分类,并提出了一种名为 SIFT(筛选文件类型)的新型筛选方法。SIFT 采用 TF-IDF 为字节(特征)分配权重,用于从文件片段中选择特征。为所选特征计算基于阈值的 LIME 和 SHAP(两种 XAI 技术)特征相关性值,以优化文件片段分类。为了改进多项式分类,开发并优化了多层感知器模型,该模型有 5 个隐藏层,每层有 \(i \times n\) 个神经元,其中 i = 层数,n = 数据集中类别的总数。在对 20 种文件类型(类)的 47,482 个样本进行测试时,SIFT 的检测率达到了 82.1%,比其他最先进的技术至少高出 10%。据我们所知,这是首次在 CF 中应用 XAI 来优化文件片段分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cybersecurity
Cybersecurity Computer Science-Information Systems
CiteScore
7.30
自引率
0.00%
发文量
77
审稿时长
9 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信