IDNet:用于身份文件分析和欺诈检测的新型数据集

Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou
{"title":"IDNet:用于身份文件分析和欺诈检测的新型数据集","authors":"Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou","doi":"arxiv-2408.01690","DOIUrl":null,"url":null,"abstract":"Effective fraud detection and analysis of government-issued identity\ndocuments, such as passports, driver's licenses, and identity cards, are\nessential in thwarting identity theft and bolstering security on online\nplatforms. The training of accurate fraud detection and analysis tools depends\non the availability of extensive identity document datasets. However, current\npublicly available benchmark datasets for identity document analysis, including\nMIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a\nlimited number of samples, cover insufficient varieties of fraud patterns, and\nseldom include alterations in critical personal identifying fields like\nportrait images, limiting their utility in training models capable of detecting\nrealistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark\ndataset, IDNet, designed to advance privacy-preserving fraud detection efforts.\nThe IDNet dataset comprises 837,060 images of synthetically generated identity\ndocuments, totaling approximately 490 gigabytes, categorized into 20 types from\n$10$ U.S. states and 10 European countries. We evaluate the utility and present\nuse cases of the dataset, illustrating how it can aid in training\nprivacy-preserving fraud detection methods, facilitating the generation of\ncamera and video capturing of identity documents, and testing schema\nunification and other identity document management functionalities.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection\",\"authors\":\"Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou\",\"doi\":\"arxiv-2408.01690\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Effective fraud detection and analysis of government-issued identity\\ndocuments, such as passports, driver's licenses, and identity cards, are\\nessential in thwarting identity theft and bolstering security on online\\nplatforms. The training of accurate fraud detection and analysis tools depends\\non the availability of extensive identity document datasets. However, current\\npublicly available benchmark datasets for identity document analysis, including\\nMIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a\\nlimited number of samples, cover insufficient varieties of fraud patterns, and\\nseldom include alterations in critical personal identifying fields like\\nportrait images, limiting their utility in training models capable of detecting\\nrealistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark\\ndataset, IDNet, designed to advance privacy-preserving fraud detection efforts.\\nThe IDNet dataset comprises 837,060 images of synthetically generated identity\\ndocuments, totaling approximately 490 gigabytes, categorized into 20 types from\\n$10$ U.S. states and 10 European countries. We evaluate the utility and present\\nuse cases of the dataset, illustrating how it can aid in training\\nprivacy-preserving fraud detection methods, facilitating the generation of\\ncamera and video capturing of identity documents, and testing schema\\nunification and other identity document management functionalities.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"59 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.01690\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

对护照、驾照和身份证等政府签发的身份证件进行有效的欺诈检测和分析,对于打击身份盗窃和加强在线平台的安全性至关重要。准确欺诈检测和分析工具的培训依赖于大量身份证件数据集的可用性。然而,目前公开的用于身份文件分析的基准数据集(包括 MIDV-500、MIDV-2020 和 FMIDV)在几个方面存在不足:它们提供的样本数量有限,涵盖的欺诈模式种类不足,而且很少包含关键个人身份识别字段(如肖像图像)的更改,这限制了它们在训练模型以检测真实欺诈行为的同时保护隐私方面的实用性。针对这些缺陷,我们的研究引入了一个新的基准数据集 IDNet,旨在推进保护隐私的欺诈检测工作。IDNet 数据集包括 837,060 张合成生成的身份证件图像,总计约 490 千兆字节,分为 20 种类型,分别来自 10 美元的美国各州和 10 个欧洲国家。我们评估了该数据集的实用性并介绍了其使用案例,说明了它如何帮助训练保护隐私的欺诈检测方法、促进身份证件的摄像头和视频捕捉生成,以及测试模式统一和其他身份证件管理功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection
Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark datasets for identity document analysis, including MIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a limited number of samples, cover insufficient varieties of fraud patterns, and seldom include alterations in critical personal identifying fields like portrait images, limiting their utility in training models capable of detecting realistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark dataset, IDNet, designed to advance privacy-preserving fraud detection efforts. The IDNet dataset comprises 837,060 images of synthetically generated identity documents, totaling approximately 490 gigabytes, categorized into 20 types from $10$ U.S. states and 10 European countries. We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信