法医数字文件检验工具类型识别

IF 2.2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation Pub Date : 2025-08-04 DOI:10.1016/j.fsidi.2025.301972

Muhammad Abdul Moiz Zia, Oluwasola Mary Adedayo

{"title":"法医数字文件检验工具类型识别","authors":"Muhammad Abdul Moiz Zia, Oluwasola Mary Adedayo","doi":"10.1016/j.fsidi.2025.301972","DOIUrl":null,"url":null,"abstract":"<div><div>Digital documents have become a significant part of our everyday lives. From identity documents to various legal agreements and business communications, the ability to determine the authenticity and origin of different types of documents is incredibly important. In the physical domain, this need is addressed by forensic document examiners. Although many of the analysis methods used in the physical domain do not apply in the digital realm, the forensic analysis processes in both realms still address similar objectives. In this paper, we focus on the objective of identifying the tool that created a digital document to support answering questions about the origin of a document. In contrast to many existing works on the forensic analysis of digital documents which focus on file type identification, this paper focuses on identifying the tool that is used to create a document. This is particularly relevant for forensic digital document examination (FDDE). The paper explores the use of different machine learning algorithms to analyze PDF documents to determine the tool that created the document. Given that traditional methods for digital document analysis often rely on metadata and visible content that can be tampered with, we used a structural analysis approach that builds on methods that have previously been used for file type identification. We explored the use of byte histograms and entropy measurements in developing models capable of identifying the specific software used to create PDF documents using several machine learning models. Our results showed that Convolutional Neural Networks (CNNs) outperformed other models. In further experiments, we explored the use of the same approach to identify the version of a specific tool used to create a document and alternative ways of creating PDFs from a tool. Our results confirm the feasibility of this approach for digital document tool type identification with a high level of accuracy.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301972"},"PeriodicalIF":2.2000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tool type identification for forensic digital document examination\",\"authors\":\"Muhammad Abdul Moiz Zia, Oluwasola Mary Adedayo\",\"doi\":\"10.1016/j.fsidi.2025.301972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Digital documents have become a significant part of our everyday lives. From identity documents to various legal agreements and business communications, the ability to determine the authenticity and origin of different types of documents is incredibly important. In the physical domain, this need is addressed by forensic document examiners. Although many of the analysis methods used in the physical domain do not apply in the digital realm, the forensic analysis processes in both realms still address similar objectives. In this paper, we focus on the objective of identifying the tool that created a digital document to support answering questions about the origin of a document. In contrast to many existing works on the forensic analysis of digital documents which focus on file type identification, this paper focuses on identifying the tool that is used to create a document. This is particularly relevant for forensic digital document examination (FDDE). The paper explores the use of different machine learning algorithms to analyze PDF documents to determine the tool that created the document. Given that traditional methods for digital document analysis often rely on metadata and visible content that can be tampered with, we used a structural analysis approach that builds on methods that have previously been used for file type identification. We explored the use of byte histograms and entropy measurements in developing models capable of identifying the specific software used to create PDF documents using several machine learning models. Our results showed that Convolutional Neural Networks (CNNs) outperformed other models. In further experiments, we explored the use of the same approach to identify the version of a specific tool used to create a document and alternative ways of creating PDFs from a tool. Our results confirm the feasibility of this approach for digital document tool type identification with a high level of accuracy.</div></div>\",\"PeriodicalId\":48481,\"journal\":{\"name\":\"Forensic Science International-Digital Investigation\",\"volume\":\"54 \",\"pages\":\"Article 301972\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Digital Investigation\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666281725001118\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281725001118","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

数字文档已经成为我们日常生活中重要的一部分。从身份文件到各种法律协议和商业通信，确定不同类型文件的真实性和来源的能力是非常重要的。在物理领域，这一需求由法医文件审查员解决。尽管物理领域中使用的许多分析方法并不适用于数字领域，但这两个领域的取证分析过程仍然解决类似的目标。在本文中，我们关注的目标是识别创建数字文档的工具，以支持回答有关文档起源的问题。与许多现有的专注于文件类型识别的数字文档取证分析工作不同，本文侧重于识别用于创建文档的工具。这与法医数字文件检查（FDDE）特别相关。本文探讨了使用不同的机器学习算法来分析PDF文档，以确定创建文档的工具。考虑到数字文档分析的传统方法通常依赖于可以被篡改的元数据和可见内容，我们使用了一种结构分析方法，该方法建立在以前用于文件类型识别的方法之上。我们探索了字节直方图和熵测量在开发模型中的使用，这些模型能够识别用于使用几个机器学习模型创建PDF文档的特定软件。我们的研究结果表明，卷积神经网络（cnn）优于其他模型。在进一步的实验中，我们探索了使用相同的方法来识别用于创建文档的特定工具的版本，以及从工具创建pdf的替代方法。我们的结果证实了这种方法在数字文档工具类型识别方面的可行性，并且具有很高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Tool type identification for forensic digital document examination

Digital documents have become a significant part of our everyday lives. From identity documents to various legal agreements and business communications, the ability to determine the authenticity and origin of different types of documents is incredibly important. In the physical domain, this need is addressed by forensic document examiners. Although many of the analysis methods used in the physical domain do not apply in the digital realm, the forensic analysis processes in both realms still address similar objectives. In this paper, we focus on the objective of identifying the tool that created a digital document to support answering questions about the origin of a document. In contrast to many existing works on the forensic analysis of digital documents which focus on file type identification, this paper focuses on identifying the tool that is used to create a document. This is particularly relevant for forensic digital document examination (FDDE). The paper explores the use of different machine learning algorithms to analyze PDF documents to determine the tool that created the document. Given that traditional methods for digital document analysis often rely on metadata and visible content that can be tampered with, we used a structural analysis approach that builds on methods that have previously been used for file type identification. We explored the use of byte histograms and entropy measurements in developing models capable of identifying the specific software used to create PDF documents using several machine learning models. Our results showed that Convolutional Neural Networks (CNNs) outperformed other models. In further experiments, we explored the use of the same approach to identify the version of a specific tool used to create a document and alternative ways of creating PDFs from a tool. Our results confirm the feasibility of this approach for digital document tool type identification with a high level of accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Forensic Science International-Digital Investigation Multiple-

CiteScore

5.90

自引率

15.00%

发文量

审稿时长

76 days