一种高度不平衡数据的无监督学习新方法

2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI) Pub Date : 2022-12-01 DOI:10.1109/CogMI56440.2022.00018

Robert K. L. Kennedy, Zahra Salekshahrezaee, T. Khoshgoftaar

{"title":"一种高度不平衡数据的无监督学习新方法","authors":"Robert K. L. Kennedy, Zahra Salekshahrezaee, T. Khoshgoftaar","doi":"10.1109/CogMI56440.2022.00018","DOIUrl":null,"url":null,"abstract":"Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data\",\"authors\":\"Robert K. L. Kennedy, Zahra Salekshahrezaee, T. Khoshgoftaar\",\"doi\":\"10.1109/CogMI56440.2022.00018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.\",\"PeriodicalId\":211430,\"journal\":{\"name\":\"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CogMI56440.2022.00018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI56440.2022.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

典型的欺诈数据集缺乏一致和准确的标签，因此，通常高度不平衡，非欺诈示例的数量大大超过欺诈示例。这对机器学习研究人员和实践者提出了重大挑战。由于这些挑战，识别欺诈数据点的有效方法需要处理高度不平衡的数据集，并且对类标记具有鲁棒性。本文介绍了一种新的无监督学习方法，通过迭代清洗训练数据集，从不平衡的无类标签数据集中学习。我们的方法使用自动编码器作为底层学习器。我们描述了它的欺诈检测性能，并将其与基线无监督欺诈检测学习器进行比较。我们的结果表明，当在公开可用的高度不平衡的信用卡欺诈检测数据集上进行测试时，我们的程序在AUC和TPR方面都明显优于基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Approach for Unsupervised Learning of Highly-Imbalanced Data

Typical fraud datasets lack consistent and accurate labels and, as such, are typically highly imbalanced with non-fraud examples greatly outnumbering the fraudulent ones. This presents significant challenges to machine learning researchers and practitioners. Due to these challenges, an effective approach in identifying fraudulent data points needs to handle highly-imbalanced datasets and be robust to class labeling. This paper introduces a novel unsupervised procedure for learning from imbalanced datasets without class labels by iteratively cleaning the training dataset. Our methodology uses an autoencoder as an underlying learner. We describe its fraud detection performance and compare it to a baseline unsupervised fraud detection learner. Our results show that our procedure significantly outperforms the baseline, in both AUC and TPR, when testing on a publicly available highly-imbalanced credit card fraud detection dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)

自引率

0.00%

发文量