多源极噪声注释下的欺诈检测

Chuang Zhang, Qizhou Wang, Tengfei Liu, Xun Lu, Jin Hong, Bo Han, Chen Gong
{"title":"多源极噪声注释下的欺诈检测","authors":"Chuang Zhang, Qizhou Wang, Tengfei Liu, Xun Lu, Jin Hong, Bo Han, Chen Gong","doi":"10.1145/3459637.3482433","DOIUrl":null,"url":null,"abstract":"Fraud detection in e-commerce, which is critical to protecting the capital safety of users and financial corporations, aims at determining whether an online transaction or other activity is fraudulent or not. This problem has been previously addressed by various fully supervised learning methods. However, the true labels for training a supervised fraud detection model are difficult to collect in many real-world cases. To circumvent this issue, a series of automatic annotation techniques are employed instead in generating multiple noisy annotations for each unknown activity. In order to utilize these low-quality, multi-sourced annotations in achieving reliable detection results, we propose an iterative two-staged fraud detection framework with multi-sourced extremely noisy annotations. In label aggregation stage, multi-sourced labels are integrated by voting with adaptive weights; and in label correction stage, the correctness of the aggregated labels are properly estimated with the help of a handful of exactly labeled data and the results are used to train a robust fraud detector. These two stages benefit from each other, and the iterative executions lead to steadily improved detection results. Therefore, our method is termed \"Label Aggregation and Correction\" (LAC). Experimentally, we collect millions of transaction records from Alipay in two different fraud detection scenarios, i.e., credit card theft and promotion abuse fraud. When compared with state-of-the-art counterparts, our method can achieve at least 0.019 and 0.117 improvements in terms of average AUC on the two collected datasets, which clearly demonstrate the effectiveness.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Fraud Detection under Multi-Sourced Extremely Noisy Annotations\",\"authors\":\"Chuang Zhang, Qizhou Wang, Tengfei Liu, Xun Lu, Jin Hong, Bo Han, Chen Gong\",\"doi\":\"10.1145/3459637.3482433\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fraud detection in e-commerce, which is critical to protecting the capital safety of users and financial corporations, aims at determining whether an online transaction or other activity is fraudulent or not. This problem has been previously addressed by various fully supervised learning methods. However, the true labels for training a supervised fraud detection model are difficult to collect in many real-world cases. To circumvent this issue, a series of automatic annotation techniques are employed instead in generating multiple noisy annotations for each unknown activity. In order to utilize these low-quality, multi-sourced annotations in achieving reliable detection results, we propose an iterative two-staged fraud detection framework with multi-sourced extremely noisy annotations. In label aggregation stage, multi-sourced labels are integrated by voting with adaptive weights; and in label correction stage, the correctness of the aggregated labels are properly estimated with the help of a handful of exactly labeled data and the results are used to train a robust fraud detector. These two stages benefit from each other, and the iterative executions lead to steadily improved detection results. Therefore, our method is termed \\\"Label Aggregation and Correction\\\" (LAC). Experimentally, we collect millions of transaction records from Alipay in two different fraud detection scenarios, i.e., credit card theft and promotion abuse fraud. When compared with state-of-the-art counterparts, our method can achieve at least 0.019 and 0.117 improvements in terms of average AUC on the two collected datasets, which clearly demonstrate the effectiveness.\",\"PeriodicalId\":405296,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459637.3482433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

电子商务中的欺诈检测旨在确定在线交易或其他活动是否存在欺诈行为,对保护用户和金融公司的资金安全至关重要。这个问题以前已经通过各种全监督学习方法解决了。然而,在许多现实案例中,训练监督欺诈检测模型的真实标签很难收集。为了避免这个问题,我们采用了一系列自动标注技术,为每个未知活动生成多个噪声标注。为了利用这些低质量的多源注释来获得可靠的检测结果,我们提出了一个具有多源极噪声注释的迭代两阶段欺诈检测框架。在标签聚合阶段,采用自适应权值投票的方法对多源标签进行整合;在标签校正阶段,利用少量准确标记的数据对聚合标签的正确性进行适当估计,并将结果用于训练鲁棒的欺诈检测器。这两个阶段相互受益,并且迭代执行导致稳步改进的检测结果。因此,我们的方法被称为“标签聚合和校正”(LAC)。实验中,我们在两种不同的欺诈检测场景下,即信用卡盗窃和促销滥用欺诈,从支付宝收集了数百万笔交易记录。与最先进的同类方法相比,我们的方法在两个收集的数据集上的平均AUC至少可以提高0.019和0.117,这清楚地证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fraud Detection under Multi-Sourced Extremely Noisy Annotations
Fraud detection in e-commerce, which is critical to protecting the capital safety of users and financial corporations, aims at determining whether an online transaction or other activity is fraudulent or not. This problem has been previously addressed by various fully supervised learning methods. However, the true labels for training a supervised fraud detection model are difficult to collect in many real-world cases. To circumvent this issue, a series of automatic annotation techniques are employed instead in generating multiple noisy annotations for each unknown activity. In order to utilize these low-quality, multi-sourced annotations in achieving reliable detection results, we propose an iterative two-staged fraud detection framework with multi-sourced extremely noisy annotations. In label aggregation stage, multi-sourced labels are integrated by voting with adaptive weights; and in label correction stage, the correctness of the aggregated labels are properly estimated with the help of a handful of exactly labeled data and the results are used to train a robust fraud detector. These two stages benefit from each other, and the iterative executions lead to steadily improved detection results. Therefore, our method is termed "Label Aggregation and Correction" (LAC). Experimentally, we collect millions of transaction records from Alipay in two different fraud detection scenarios, i.e., credit card theft and promotion abuse fraud. When compared with state-of-the-art counterparts, our method can achieve at least 0.019 and 0.117 improvements in terms of average AUC on the two collected datasets, which clearly demonstrate the effectiveness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信