Hypervisor-based data synthesis: On its potential to tackle the curse of client-side agent remnants in forensic image generation

IF 2 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Dennis Wolf , Thomas Göbel , Harald Baier
{"title":"Hypervisor-based data synthesis: On its potential to tackle the curse of client-side agent remnants in forensic image generation","authors":"Dennis Wolf ,&nbsp;Thomas Göbel ,&nbsp;Harald Baier","doi":"10.1016/j.fsidi.2023.301690","DOIUrl":null,"url":null,"abstract":"<div><p>In the field of digital forensics, the number and heterogeneity of devices typically involved in an investigation is increasing. In order to train digital forensics practitioners and make faster progress in the development and validation of forensic tools, the demand for up-to-date data sets is high. However, manually creating data sets is a complex, tedious, and time-consuming task increasing the need for automated solutions. Existing data generation frameworks typically use components that run directly on the simulated client (e.g., a client-side agent controlled via SSH). On the one hand, this facilitates simulation by providing direct feedback from the client and the ability to use client-side libraries to access software. On the other hand, however, this approach creates unintended traces in the generated data sets that quickly reveal their synthetic origin and affect their realism and thus their relevance. To avoid such traces, this paper presents a hypervisor-based solution to eliminate such a client-side software component in a recent digital forensic data set generator, while compensating for its absence only through host-side means. To demonstrate the practicability of the proposed approach as well as the indistinguishability of the generated traces, a multi-participant scenario is performed as a proof of concept to replicate a realistic attack scenario on a Linux system from a Kali attacker machine. During the evaluation, the generated data set is compared in terms of unintended traces and realism to a data set generated by the same framework using an agent component. In this way, we demonstrate the benefits and overall usefulness of an agent-less data synthesis approach.</p></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666281723002093/pdfft?md5=e999660e34e9dfdd4cd9e4ea9eab250e&pid=1-s2.0-S2666281723002093-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281723002093","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of digital forensics, the number and heterogeneity of devices typically involved in an investigation is increasing. In order to train digital forensics practitioners and make faster progress in the development and validation of forensic tools, the demand for up-to-date data sets is high. However, manually creating data sets is a complex, tedious, and time-consuming task increasing the need for automated solutions. Existing data generation frameworks typically use components that run directly on the simulated client (e.g., a client-side agent controlled via SSH). On the one hand, this facilitates simulation by providing direct feedback from the client and the ability to use client-side libraries to access software. On the other hand, however, this approach creates unintended traces in the generated data sets that quickly reveal their synthetic origin and affect their realism and thus their relevance. To avoid such traces, this paper presents a hypervisor-based solution to eliminate such a client-side software component in a recent digital forensic data set generator, while compensating for its absence only through host-side means. To demonstrate the practicability of the proposed approach as well as the indistinguishability of the generated traces, a multi-participant scenario is performed as a proof of concept to replicate a realistic attack scenario on a Linux system from a Kali attacker machine. During the evaluation, the generated data set is compared in terms of unintended traces and realism to a data set generated by the same framework using an agent component. In this way, we demonstrate the benefits and overall usefulness of an agent-less data synthesis approach.

基于管理程序的数据合成:在法证图像生成中解决客户端代理残留问题的潜力
在数字取证领域,调查通常涉及的设备的数量和异质性都在不断增加。为了培训数字取证从业人员,加快取证工具的开发和验证进度,对最新数据集的需求很高。然而,手动创建数据集是一项复杂、乏味且耗时的任务,因此对自动化解决方案的需求日益增加。现有的数据生成框架通常使用直接在模拟客户端上运行的组件(例如,通过 SSH 控制的客户端代理)。一方面,这种方法能提供来自客户端的直接反馈,并能使用客户端库访问软件,从而为仿真提供便利。但另一方面,这种方法会在生成的数据集中产生意想不到的痕迹,这些痕迹会很快暴露其合成来源,影响其真实性,从而影响其相关性。为了避免这种痕迹,本文提出了一种基于管理程序的解决方案,在最近的数字取证数据集生成器中消除了客户端软件组件,同时仅通过主机端手段对其缺失进行补偿。为了证明所提方法的实用性以及生成痕迹的无差别性,本文以一个多人参与的场景作为概念验证,在 Linux 系统上复制了一个来自 Kali 攻击者机器的真实攻击场景。在评估过程中,我们将生成的数据集与使用代理组件的同一框架生成的数据集在意外痕迹和真实性方面进行了比较。通过这种方式,我们展示了无代理数据合成方法的优势和整体实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.90
自引率
15.00%
发文量
87
审稿时长
76 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信