{"title":"Suspicious Bank Card Transaction Recognition Based on K-means Clustering and Random Forest Algorithm","authors":"Y. Liu, Zeshen Tang, Wenjie Zheng","doi":"10.1109/ICIIBMS46890.2019.8991451","DOIUrl":null,"url":null,"abstract":"Suspicious transactions are hidden in thousands of massive transaction data, causing incalculable losses and risks, but the detection is very difficult. In terms of how to effectively explore and identify suspicious transactions from massive transaction data accurately and quickly, this paper adopts the method based on the combination of k-means algorithm and random forest algorithm to solve the problem of data imbalance in the identification of suspicious transactions in bank accounts, and proposes an effective suspicious transaction detection model. At the same time, the AUC(Area Under Curve) and Recall indicators such as the unbalanced data classification standard of performance evaluation, and finally to Kaggle data platform for the bank account of suspicious transactions data set, the results show that the proposed detection model of performance evaluation index AUC increased by 5%, F1-measure increases by 1%, show that the method has some reference value to the suspicious transactions recognition, limited information utilization rate is higher, which makes all kinds of Banks prediction speed and accuracy of suspicious transactions events get improved, can to some extent, reduce the operating cost and risk of the banking sector.","PeriodicalId":444797,"journal":{"name":"2019 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIBMS46890.2019.8991451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Suspicious transactions are hidden in thousands of massive transaction data, causing incalculable losses and risks, but the detection is very difficult. In terms of how to effectively explore and identify suspicious transactions from massive transaction data accurately and quickly, this paper adopts the method based on the combination of k-means algorithm and random forest algorithm to solve the problem of data imbalance in the identification of suspicious transactions in bank accounts, and proposes an effective suspicious transaction detection model. At the same time, the AUC(Area Under Curve) and Recall indicators such as the unbalanced data classification standard of performance evaluation, and finally to Kaggle data platform for the bank account of suspicious transactions data set, the results show that the proposed detection model of performance evaluation index AUC increased by 5%, F1-measure increases by 1%, show that the method has some reference value to the suspicious transactions recognition, limited information utilization rate is higher, which makes all kinds of Banks prediction speed and accuracy of suspicious transactions events get improved, can to some extent, reduce the operating cost and risk of the banking sector.
可疑交易隐藏在成千上万的海量交易数据中,造成不可估量的损失和风险,但检测难度很大。针对如何从海量交易数据中准确、快速地有效挖掘和识别可疑交易,本文采用基于k-means算法与随机森林算法相结合的方法,解决银行账户可疑交易识别中的数据不平衡问题,提出了一种有效的可疑交易检测模型。同时,将AUC(Area Under Curve)和Recall等指标作为不平衡数据的分类标准进行绩效评价,最后以Kaggle数据平台为例对银行账户的可疑交易数据集进行分析,结果表明,所提出的检测模型的绩效评价指标AUC提高了5%,f1测度提高了1%,表明该方法对可疑交易的识别具有一定的参考价值。有限信息的利用率较高,这使得各类银行对可疑交易事件的预测速度和准确性得到提高,可以在一定程度上降低银行业的经营成本和风险。