Zainab Khalid , Farkhund Iqbal , Benjamin C.M. Fung
{"title":"Towards a unified XAI-based framework for digital forensic investigations","authors":"Zainab Khalid , Farkhund Iqbal , Benjamin C.M. Fung","doi":"10.1016/j.fsidi.2024.301806","DOIUrl":"10.1016/j.fsidi.2024.301806","url":null,"abstract":"<div><div>Explainable Artificial Intelligence (XAI) aims to alleviate the black-box AI conundrum in the field of Digital Forensics (DF) (and others) by providing layman-interpretable explanations to predictions made by AI models. It also handles the increasing volumes of forensic images that are impossible to investigate via manual methods; or even automated forensic tools. A holistic, generalized, yet exhaustive framework detailing the workflow of XAI for DF is proposed for standardization. A case study examining the implementation of the framework in a network forensics investigative scenario is presented for demonstration. In addition, the XAI-DF project lays the basis for a collaborative effort from the forensics community, aimed at creating an open-source forensic database that may be employed to train AI models for the digital forensics domain. As an onset contribution to the project, we create a memory forensics database of 27 memory dumps (Windows 7, 10, and 11) simulating malware activity and extracting relevant features (specific to processes, injected code, network connections, API hooks, and process privileges) that may be used for training, testing, and validating AI models in keeping with the XAI-DF framework.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GBKPA and AuxShield: Addressing adversarial robustness and transferability in android malware detection","authors":"Kumarakrishna Valeti, Hemant Rathore","doi":"10.1016/j.fsidi.2024.301816","DOIUrl":"10.1016/j.fsidi.2024.301816","url":null,"abstract":"<div><div>Android stands as the predominant operating system within the mobile ecosystem. Users can download applications from official sources like <em>Google Play Store</em> and other third-party platforms. However, malicious actors can attempt to compromise user device integrity through malicious applications. Traditionally, signatures, rules, and other methods have been employed to detect malware attacks and protect device integrity. However, the growing number and complexity of malicious applications have prompted the exploration of newer techniques like machine learning (ML) and deep learning (DL). Many recent studies have demonstrated promising results in detecting malicious applications using ML and DL solutions. However, research in other fields, such as computer vision, has shown that ML and DL solutions are vulnerable to targeted adversarial attacks. Malicious actors can develop malicious adversarial applications that can bypass ML and DL based anti-viruses. The study of adversarial techniques related to malware detection has now captured the security community’s attention. In this work, we utilise android permissions and intents to construct 28 distinct malware detection models using 14 classification algorithms. Later, we introduce a novel targeted false-negative evasion attack, <em>Gradient Based K Perturbation Attack (GBKPA)</em>, designed for grey-box knowledge scenarios to assess the robustness of these models. The GBKPA attempts to craft malicious adversarial samples by making minimal perturbations without violating the syntactic and functional structure of the application. GBKPA achieved an average fooling rate (FR) of 77 % with only five perturbations across the 28 detection models. Additionally, we identified the most vulnerable android permissions and intents that malicious actors can exploit for evasion attacks. Furthermore, we analyse the transferability of adversarial samples across different classes of models and provide explanations for the same. Finally, we proposed <em>AuxShield</em> defence mechanism to develop robust detection models. AuxShield reduced the average FR to 3.25 % against 28 detection models. Our findings underscore the need to understand the causation of adversarial samples, their transferability, and robust defence strategies before deploying ML and DL solutions in the real world.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The provenance of Apple Health data: A timeline of update history","authors":"Luke Jennings , Matthew Sorell , Hugo G. Espinosa","doi":"10.1016/j.fsidi.2024.301804","DOIUrl":"10.1016/j.fsidi.2024.301804","url":null,"abstract":"<div><div>Fitness tracking smart watches are becoming more prevalent in investigations and the need to understand and document their forensic potential and limitations is important for practitioners and researchers. Such fitness devices have undergone several hardware and software upgrades, changing the way they operate and evolving as more sophisticated pieces of technology. One example is the Apple Watch, working in conjunction with the Apple iPhone, to measure and record a vast amount of health information in the Apple Health database, <em>healthdb</em>_<em>secure</em>.<em>sqlite</em>. Over time, an end user will update their devices, but their health data, uniquely, carries over from one device to the next. In this paper, we investigate and analyse the hardware and software provenance of a real 5+ year Apple Health dataset to determine changes, patterns and anomalies over time. This provenance investigation provides insights in the form of (1) a timeline, representing the dataset's history of device and firmware updates that can be used in the context of investigation validation, (2) anomaly detection and, (3) insights into cyber hygiene. Analysis of the non-health data recorded in the health database arguably provides just as much insightful information as the health data itself.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mount SMB.pcap: Reconstructing file systems and file operations from network traffic","authors":"Jan-Niclas Hilgert, Axel Mahr, Martin Lambertz","doi":"10.1016/j.fsidi.2024.301807","DOIUrl":"10.1016/j.fsidi.2024.301807","url":null,"abstract":"<div><div>File system and network forensics are fundamental in forensic investigations, but are often treated as distinct disciplines. This work seeks to unify these fields by introducing a novel framework capable of mounting network captures, enabling investigators to seamlessly browse data using conventional tools. Although our implementation supports various protocols such as HTTP, TLS, and FTP, this work will particularly focus on the complexities of the Server Message Block (SMB) protocol, which is fundamental for shared file system access, especially within local networks.</div><div>For this, we present a detailed methodology to extract essential file system data from SMB network traffic, aiming to reconstruct the share's file system as accurately as the original. Our approach goes beyond traditional tools like Wireshark, which typically only extract individual files from SMB transmissions. Instead, we reconstruct the entire file system hierarchy, retrieve all associated metadata, and handle multiple versions of files captured within the same network traffic. In addition, we also investigate how file operations impact SMB commands and show how these can be used to accurately recreate user activities on an SMB share based solely on network traffic. Although both methodologies and implementations can be applied independently, their combination provides investigators with a comprehensive view of the reconstructed file system along with the corresponding user activities extracted from network traffic.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lena L. Voigt , Felix Freiling , Christopher J. Hargreaves
{"title":"Re-imagen: Generating coherent background activity in synthetic scenario-based forensic datasets using large language models","authors":"Lena L. Voigt , Felix Freiling , Christopher J. Hargreaves","doi":"10.1016/j.fsidi.2024.301805","DOIUrl":"10.1016/j.fsidi.2024.301805","url":null,"abstract":"<div><div>Due to legal and privacy-related restrictions, the generation of <em>synthetic</em> data is recommended for creating datasets for digital forensic education and training. One challenge when synthesizing scenario-based forensic data is the creation of coherent background activity besides evidential actions. This work leverages the creative writing abilities of large language models (LLMs) to generate personas and actions that describe the background usage of a device consistent with the created persona. These actions are subsequently converted into a machine-readable format and executed on a virtualized device using VM control automation. We introduce Re-imagen, a framework that combines state-of-the-art LLMs and a recent unintrusive GUI automation tool to produce synthetic disk images that contain arguably coherent “wear-and-tear” artifacts that current synthesis platforms lack. While, for now, the focus is on the coherence of the generated background activity, we believe that the proposed approach is a step toward more <em>realistic</em> synthetic disk image generation.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sorin Im , Hyunah Park , Jihun Joun , Sangjin Lee , Jungheum Park
{"title":"Revisiting logical image formats for future digital forensics: A comprehensive analysis on L01 and AFF4-L","authors":"Sorin Im , Hyunah Park , Jihun Joun , Sangjin Lee , Jungheum Park","doi":"10.1016/j.fsidi.2024.301811","DOIUrl":"10.1016/j.fsidi.2024.301811","url":null,"abstract":"<div><div>As the capacity of storage devices continues to increase significantly and cloud environments emerge, there is a need to perform logical imaging to selectively collect specific data relevant to a case. However, there is currently insufficient research addressing the appropriateness and usability of logical image file formats, which could potentially raise issues in terms of the originality and integrity of digital evidence. This study performs a comprehensive analysis of the internal structures and metadata of existing proprietary and open-source logical image file formats, with a particular focus on the L01 and AFF4-L. <span>Furthermore</span>, this study reveals several limitations of each file format and the supporting tools through practical experiments including metadata manipulation and stress tests. More specifically, the potential for loss of originality and metadata manipulation during and after logical imaging underscores the necessity for the development and standardization of more advanced logical image file formats to systematically manage different types of digital evidence from different sources. The findings of this research also demonstrate the necessity of collective efforts from the community for the continuous improvement of logical image file formats.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elénore Ryser , Hannes Spichiger , David-Olivier Jaquet-Chiffelle
{"title":"Geotagging accuracy in smartphone photography","authors":"Elénore Ryser , Hannes Spichiger , David-Olivier Jaquet-Chiffelle","doi":"10.1016/j.fsidi.2024.301813","DOIUrl":"10.1016/j.fsidi.2024.301813","url":null,"abstract":"<div><div>After a decade of technological advancements, digital forensic science is under increasing pressure to deliver investigative findings with a high degree of scientific rigor. The judicial community has voiced growing concerns regarding digital traces and their interpretation. This research focuses on assessing the significance of geolocation information embedded within the metadata of photographs captured using a mobile phone. In order to examine the variability in the accuracy of this geolocation metadata and identify potential external influences, images were taken at 29 different locations distributed along three distinct paths. The photographs were captured using two Samsung Galaxy S8 SM-G950F devices running on Android 8.0. Various configurations of GNSS and mobile network connections were tested, and their potential impact on the accuracy of geolocation metadata was investigated. The findings show the dependency of geolocation accuracy on the specific measurement location. This research ultimately highlights the imperative for evaluative approaches to take into account the specific characteristics of each point of interest, as opposed to leaning on broad statements about the reliability of geolocation processes in general.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MIC: Memory analysis of IndexedDB data on Chromium-based applications","authors":"Byeongchan Jeong, Sangjin Lee, Jungheum Park","doi":"10.1016/j.fsidi.2024.301809","DOIUrl":"10.1016/j.fsidi.2024.301809","url":null,"abstract":"<div><div>As Chromium-based applications continue to gain popularity, it is necessary for forensic investigators to obtain a comprehensive understanding of how they store and manage browsing artifacts from both filesystem and memory perspectives. In particular, the <em>incognito</em> mode developed in the current version of Chromium uses only physical memory to manage data related to active sessions. Therefore, handling physical memory is essential for tracking a user's browsing behaviour in incognito mode. This paper provides an in-depth examination of LevelDB, a lightweight key-value database utilized as Chromium's implementation for IndexedDB. In particular, we delve into the details of how IndexedDB data is managed through LevelDB, taking into account its low-level database file format. Furthermore, we thoroughly explore the possibility of residual data, both complete and incomplete, being retained as applications create and initialize IndexedDB-related data. Based on our research findings, we propose a systematic methodology for inspecting the internal structures of LevelDB-related C++ classes, carving these structures from binary streams, and interpreting the data for forensic analysis. In addition, we develop a proof-of-concept tool based on our approach and demonstrate its performance and effectiveness through case studies.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video source identification using machine learning: A case study of 16 instant messaging applications","authors":"Hyomin Yang , Junho Kim , Jungheum Park","doi":"10.1016/j.fsidi.2024.301812","DOIUrl":"10.1016/j.fsidi.2024.301812","url":null,"abstract":"<div><div>In recent years, there has been a notable increase in the prevalence of cybercrimes related to video content, including the distribution of illegal videos and the sharing of copyrighted material. This has led to the growing importance of identifying the source of video files to trace the owner of the files involved in the incident or identify the distributor. Previous research has concentrated on revealing the device (brand and/or model) that “originally” created a video file. This has been achieved by analysing the pattern noise generated by the image sensor in the camera, the storage structural features of the file, and the metadata patterns. However, due to the widespread use of mobile environments, instant messaging applications (IMAs) such as Telegram and Wire have been utilized to share illegal videos, which can result in the loss of information from the original file due to re-encoding at the application level, depending on the transmission settings. Consequently, it is necessary to extend the scope of existing research to identify the various applications that are capable of re-encoding video files in transit. Furthermore, it is essential to determine whether there are features that can be leveraged to distinguish them from the source identification perspective. In this paper, we propose a machine learning-based methodology for classifying the source application by extracting various features stored in the storage format and internal metadata of video files. To conduct this study, we analyzed 16 IMAs that are widely used in mobile environments and generated a total of 1974 sample videos, taking into account both the transmission options and encoding settings offered by each IMA. The training and testing results on this dataset indicate that the ExtraTrees model achieved an identification accuracy of approximately 99.96 %. Furthermore, we developed a proof-of-concept tool based on the proposed method, which extracts the suggested features from videos and queries a pre-trained model. This tool is released as open-source software for the community.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Welcome to the proceedings of the Fourth Annual DFRWS APAC Conference 2024!","authors":"Raymond Chan","doi":"10.1016/j.fsidi.2024.301819","DOIUrl":"10.1016/j.fsidi.2024.301819","url":null,"abstract":"","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}