Teerath Kumar, Muhammad Turab, Shahnawaz Talpur, Rob Brennan, Malika Bendechache
{"title":"伪造字符检测数据集:护照,驾驶执照和签证贴纸","authors":"Teerath Kumar, Muhammad Turab, Shahnawaz Talpur, Rob Brennan, Malika Bendechache","doi":"10.5121/ijaia.2022.13202","DOIUrl":null,"url":null,"abstract":"Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.e. FCD-P 2 , FCD-D 3 and FCD-V 4.","PeriodicalId":391502,"journal":{"name":"International Journal of Artificial Intelligence & Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Forged Character Detection Datasets: Passports, Driving Licences and Visa Stickers\",\"authors\":\"Teerath Kumar, Muhammad Turab, Shahnawaz Talpur, Rob Brennan, Malika Bendechache\",\"doi\":\"10.5121/ijaia.2022.13202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.e. FCD-P 2 , FCD-D 3 and FCD-V 4.\",\"PeriodicalId\":391502,\"journal\":{\"name\":\"International Journal of Artificial Intelligence & Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Artificial Intelligence & Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/ijaia.2022.13202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Artificial Intelligence & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/ijaia.2022.13202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
摘要
伪造的文件,特别是护照、驾驶执照和VISA贴纸,被用于欺诈目的,包括抢劫、盗窃等等。因此,文件伪造字符的检测是数字法医成像中十分重要和具有挑战性的课题。伪造字符检测面临两大挑战。第一个挑战是,伪造字符检测的数据非常难以获得,原因包括数据访问受限,未标记数据或对私人数据进行的工作。第二个挑战是,深度学习(DL)算法需要标记数据,这带来了进一步的挑战,因为标记是乏味、耗时、昂贵的,并且需要领域的专业知识。为了解决这些问题,本文提出了一种新的算法,该算法生成了护照伪造字符检测(FCD-P)、驾照伪造字符检测(FCD-D)和VISA贴纸伪造字符检测(FCD-V)三个数据集。据我们所知,我们是第一个发布这些数据集的。该算法首先读取普通文件图像,在五个不同国家的护照、驾照和VISA贴纸上模拟伪造模拟任务。然后,它将边界框作为伪造字符的跟踪,作为标记过程。此外,考虑到现实场景,我们相应地执行了所选的数据增强。关于数据集的统计,每个数据集由15000张图像组成,每张图像的大小为950 x 550。为了进一步的研究目的,我们发布了我们的算法代码1和数据集,即fcd - p2, fcd - d3和fcd - v4。
Forged Character Detection Datasets: Passports, Driving Licences and Visa Stickers
Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.e. FCD-P 2 , FCD-D 3 and FCD-V 4.