Source printer identification from document images acquired using smartphone

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2024-06-13 DOI:10.1016/j.jisa.2024.103804

Sharad Joshi , Suraj Saxena , Nitin Khanna

{"title":"Source printer identification from document images acquired using smartphone","authors":"Sharad Joshi , Suraj Saxena , Nitin Khanna","doi":"10.1016/j.jisa.2024.103804","DOIUrl":null,"url":null,"abstract":"<div><p>Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a printed document in a fast and cost-effective manner. Even when fraudulent documents are identified, information about their origin can help stop future frauds. If a smartphone camera replaces scanner for the document acquisition process, document forensics would be more economical, user-friendly, and even faster in many applications where remote and distributed analysis is beneficial. Building on existing methods, we propose to learn a single CNN model from the fusion of letter images and their printer-specific noise residuals. In the absence of any publicly available dataset, we created a new dataset consisting of 2250 document images of text documents printed by eighteen printers and acquired by a smartphone camera at five acquisition settings. The proposed method achieves 98.42% document classification accuracy using images of letter ‘e’ under a 5 × 2 cross-validation approach. Further, when tested using about half a million letters of all types, it achieves 90.33% and 98.01% letter and document classification accuracies, respectively, thus highlighting the ability to learn a discriminative model without dependence on a single letter type. Also, classification accuracies are encouraging under various acquisition settings, including low illumination and change in angle between the document and camera planes.</p></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"84 ","pages":"Article 103804"},"PeriodicalIF":3.7000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212624001078","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a printed document in a fast and cost-effective manner. Even when fraudulent documents are identified, information about their origin can help stop future frauds. If a smartphone camera replaces scanner for the document acquisition process, document forensics would be more economical, user-friendly, and even faster in many applications where remote and distributed analysis is beneficial. Building on existing methods, we propose to learn a single CNN model from the fusion of letter images and their printer-specific noise residuals. In the absence of any publicly available dataset, we created a new dataset consisting of 2250 document images of text documents printed by eighteen printers and acquired by a smartphone camera at five acquisition settings. The proposed method achieves 98.42% document classification accuracy using images of letter ‘e’ under a 5 × 2 cross-validation approach. Further, when tested using about half a million letters of all types, it achieves 90.33% and 98.01% letter and document classification accuracies, respectively, thus highlighting the ability to learn a discriminative model without dependence on a single letter type. Also, classification accuracies are encouraging under various acquisition settings, including low illumination and change in angle between the document and camera planes.

查看原文本刊更多论文

从使用智能手机获取的文档图像中识别源打印机

大量印刷文件继续被用于各种重要和琐碎的应用。这些应用通常依赖于以印刷文本文件形式提供的信息，而由于时间限制和资源匮乏，对这些文件的完整性验证构成了挑战。源打印机识别可以快速、经济高效地提供有关打印文件来源和完整性的重要信息。即使识别出欺诈性文件，有关其来源的信息也有助于阻止未来的欺诈行为。如果用智能手机摄像头取代扫描仪来获取文档，那么文档取证将变得更加经济、用户友好，甚至在许多有利于远程和分布式分析的应用中更加快捷。在现有方法的基础上，我们建议从字母图像及其打印机特定噪声残差的融合中学习单一 CNN 模型。在没有任何公开数据集的情况下，我们创建了一个新的数据集，该数据集由 2250 张文档图像组成，这些图像是由 18 台打印机打印的文本文档，并由智能手机摄像头在五种采集设置下获取。在 5 × 2 交叉验证方法下，使用字母 "e "的图像，所提出的方法达到了 98.42% 的文档分类准确率。此外，在使用约 50 万个各种类型的字母进行测试时，该方法的字母和文档分类准确率分别达到了 90.33% 和 98.01%，从而凸显了不依赖单一字母类型而学习判别模型的能力。此外，在各种采集设置下，包括低照度和文档与相机平面之间的角度变化，分类准确率也令人鼓舞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.