{"title":"法医数字文件检验工具类型识别","authors":"Muhammad Abdul Moiz Zia, Oluwasola Mary Adedayo","doi":"10.1016/j.fsidi.2025.301972","DOIUrl":null,"url":null,"abstract":"<div><div>Digital documents have become a significant part of our everyday lives. From identity documents to various legal agreements and business communications, the ability to determine the authenticity and origin of different types of documents is incredibly important. In the physical domain, this need is addressed by forensic document examiners. Although many of the analysis methods used in the physical domain do not apply in the digital realm, the forensic analysis processes in both realms still address similar objectives. In this paper, we focus on the objective of identifying the tool that created a digital document to support answering questions about the origin of a document. In contrast to many existing works on the forensic analysis of digital documents which focus on file type identification, this paper focuses on identifying the tool that is used to create a document. This is particularly relevant for forensic digital document examination (FDDE). The paper explores the use of different machine learning algorithms to analyze PDF documents to determine the tool that created the document. Given that traditional methods for digital document analysis often rely on metadata and visible content that can be tampered with, we used a structural analysis approach that builds on methods that have previously been used for file type identification. We explored the use of byte histograms and entropy measurements in developing models capable of identifying the specific software used to create PDF documents using several machine learning models. Our results showed that Convolutional Neural Networks (CNNs) outperformed other models. In further experiments, we explored the use of the same approach to identify the version of a specific tool used to create a document and alternative ways of creating PDFs from a tool. Our results confirm the feasibility of this approach for digital document tool type identification with a high level of accuracy.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301972"},"PeriodicalIF":2.2000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tool type identification for forensic digital document examination\",\"authors\":\"Muhammad Abdul Moiz Zia, Oluwasola Mary Adedayo\",\"doi\":\"10.1016/j.fsidi.2025.301972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Digital documents have become a significant part of our everyday lives. From identity documents to various legal agreements and business communications, the ability to determine the authenticity and origin of different types of documents is incredibly important. In the physical domain, this need is addressed by forensic document examiners. Although many of the analysis methods used in the physical domain do not apply in the digital realm, the forensic analysis processes in both realms still address similar objectives. In this paper, we focus on the objective of identifying the tool that created a digital document to support answering questions about the origin of a document. In contrast to many existing works on the forensic analysis of digital documents which focus on file type identification, this paper focuses on identifying the tool that is used to create a document. This is particularly relevant for forensic digital document examination (FDDE). The paper explores the use of different machine learning algorithms to analyze PDF documents to determine the tool that created the document. Given that traditional methods for digital document analysis often rely on metadata and visible content that can be tampered with, we used a structural analysis approach that builds on methods that have previously been used for file type identification. We explored the use of byte histograms and entropy measurements in developing models capable of identifying the specific software used to create PDF documents using several machine learning models. Our results showed that Convolutional Neural Networks (CNNs) outperformed other models. In further experiments, we explored the use of the same approach to identify the version of a specific tool used to create a document and alternative ways of creating PDFs from a tool. Our results confirm the feasibility of this approach for digital document tool type identification with a high level of accuracy.</div></div>\",\"PeriodicalId\":48481,\"journal\":{\"name\":\"Forensic Science International-Digital Investigation\",\"volume\":\"54 \",\"pages\":\"Article 301972\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Digital Investigation\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666281725001118\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281725001118","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Tool type identification for forensic digital document examination
Digital documents have become a significant part of our everyday lives. From identity documents to various legal agreements and business communications, the ability to determine the authenticity and origin of different types of documents is incredibly important. In the physical domain, this need is addressed by forensic document examiners. Although many of the analysis methods used in the physical domain do not apply in the digital realm, the forensic analysis processes in both realms still address similar objectives. In this paper, we focus on the objective of identifying the tool that created a digital document to support answering questions about the origin of a document. In contrast to many existing works on the forensic analysis of digital documents which focus on file type identification, this paper focuses on identifying the tool that is used to create a document. This is particularly relevant for forensic digital document examination (FDDE). The paper explores the use of different machine learning algorithms to analyze PDF documents to determine the tool that created the document. Given that traditional methods for digital document analysis often rely on metadata and visible content that can be tampered with, we used a structural analysis approach that builds on methods that have previously been used for file type identification. We explored the use of byte histograms and entropy measurements in developing models capable of identifying the specific software used to create PDF documents using several machine learning models. Our results showed that Convolutional Neural Networks (CNNs) outperformed other models. In further experiments, we explored the use of the same approach to identify the version of a specific tool used to create a document and alternative ways of creating PDFs from a tool. Our results confirm the feasibility of this approach for digital document tool type identification with a high level of accuracy.