信用卡诈骗检测的机器学习技术

Journal of Technology Informatics and Engineering Pub Date : 2022-04-26 DOI:10.51903/jtie.v1i1.143

Fujiama Diapoldo Silalahi, Toni Wijanarko Adi Putra, Edy Siswanto

{"title":"信用卡诈骗检测的机器学习技术","authors":"Fujiama Diapoldo Silalahi, Toni Wijanarko Adi Putra, Edy Siswanto","doi":"10.51903/jtie.v1i1.143","DOIUrl":null,"url":null,"abstract":"Credit Card (CC) scam In financial markets is a growing nuisance. CC scams increasing rapidly and causing large amounts of financial losses for organizations, governments, and public institutions, especially now that all payment methods for e-commerce shopping can be done much more easily through digital payment methods. For this reason, the purpose of this study is to detect scam CC transactions from a given dataset by performing a predictive investigation on the CC transaction dataset using machine learning techniques. The method used is a predictive model approach, namely logistic regression models (LR-M), random forests (RF), and XGBoost combined along particular resampling techniques that have been practiced to anticipate scams and the authenticity of CC transactions. Model performance was calculated grounded Re-call Curve (RC), precision, f1-score, PR, and ROC. \nThe experimental results show that the random forest in combination with the hybrid resampling approach of SMOTE and removal of Tomek Links works better than other models. The random forest model and XGBoost accomplished are preferred over the LR-M as long as their global f1 score is without re-sampling. This demonstrates the strength of one technique that can provide greater achievement alike in the existence of class inequality dilemmas. Each approach, at the same time when used with Ran-Under, will give a great memory score but fails cursedly in the language of accuracy. Compared to the coordinate model sine re-sampling, the accuracy and RS are not repaired in cases where Tomek linker displacement was used. RF and xgboost perform quite well in terms of f1-S when Ran-Over is used. SMOTE increases the random forest draw score and xgboost but the precision score (PS) decreases slightly. \nCompletely, during a hybrid solution of Tomek delinker and SMOTE was practiced with random forest, it gave equitable attention and RS in the PR-AUC. XGboost failed to increase the PS even though the same re-sampling technique was used. For future research, a fee-delicate study method can be applied as long as fee misclassifications. So for future research, it is very necessary to consider this behavior change and it is also very important to develop predictive models. In addition to this, much larger data is needed so that detailed studies on handling non-stationary properties in CC scam detection can be carried out better.","PeriodicalId":177576,"journal":{"name":"Journal of Technology Informatics and Engineering","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MACHINE LEARNING TECHNIQUE FOR CREDIT CARD SCAM DETECTION\",\"authors\":\"Fujiama Diapoldo Silalahi, Toni Wijanarko Adi Putra, Edy Siswanto\",\"doi\":\"10.51903/jtie.v1i1.143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Credit Card (CC) scam In financial markets is a growing nuisance. CC scams increasing rapidly and causing large amounts of financial losses for organizations, governments, and public institutions, especially now that all payment methods for e-commerce shopping can be done much more easily through digital payment methods. For this reason, the purpose of this study is to detect scam CC transactions from a given dataset by performing a predictive investigation on the CC transaction dataset using machine learning techniques. The method used is a predictive model approach, namely logistic regression models (LR-M), random forests (RF), and XGBoost combined along particular resampling techniques that have been practiced to anticipate scams and the authenticity of CC transactions. Model performance was calculated grounded Re-call Curve (RC), precision, f1-score, PR, and ROC. \\nThe experimental results show that the random forest in combination with the hybrid resampling approach of SMOTE and removal of Tomek Links works better than other models. The random forest model and XGBoost accomplished are preferred over the LR-M as long as their global f1 score is without re-sampling. This demonstrates the strength of one technique that can provide greater achievement alike in the existence of class inequality dilemmas. Each approach, at the same time when used with Ran-Under, will give a great memory score but fails cursedly in the language of accuracy. Compared to the coordinate model sine re-sampling, the accuracy and RS are not repaired in cases where Tomek linker displacement was used. RF and xgboost perform quite well in terms of f1-S when Ran-Over is used. SMOTE increases the random forest draw score and xgboost but the precision score (PS) decreases slightly. \\nCompletely, during a hybrid solution of Tomek delinker and SMOTE was practiced with random forest, it gave equitable attention and RS in the PR-AUC. XGboost failed to increase the PS even though the same re-sampling technique was used. For future research, a fee-delicate study method can be applied as long as fee misclassifications. So for future research, it is very necessary to consider this behavior change and it is also very important to develop predictive models. In addition to this, much larger data is needed so that detailed studies on handling non-stationary properties in CC scam detection can be carried out better.\",\"PeriodicalId\":177576,\"journal\":{\"name\":\"Journal of Technology Informatics and Engineering\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Technology Informatics and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.51903/jtie.v1i1.143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Technology Informatics and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51903/jtie.v1i1.143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在金融市场上，信用卡诈骗是一个越来越令人讨厌的问题。CC诈骗迅速增加，给组织、政府和公共机构造成了大量的经济损失，特别是现在所有的电子商务购物支付方式都可以通过数字支付方式更容易地完成。出于这个原因，本研究的目的是通过使用机器学习技术对CC交易数据集进行预测调查，从给定数据集中检测诈骗CC交易。所使用的方法是一种预测模型方法，即逻辑回归模型(LR-M)、随机森林(RF)和XGBoost，结合了特定的重采样技术，这些技术已经被用于预测欺诈和CC交易的真实性。模型的性能根据再调用曲线(RC)、精度、f1评分、PR和ROC进行计算。实验结果表明，随机森林结合SMOTE和去除Tomek Links的混合重采样方法比其他模型效果更好。随机森林模型和完成的XGBoost比LR-M更受欢迎，只要它们的全局f1得分不需要重新采样。这证明了一种技术的力量，它可以在存在阶级不平等困境时提供更大的成就。每一种方法，在与Ran-Under同时使用时，都会给出一个很好的记忆分数，但在准确性方面却失败了。与坐标模型正弦重采样相比，在使用Tomek连杆位移的情况下，精度和RS没有得到修复。当使用Ran-Over时，RF和xgboost在f1-S方面表现相当好。SMOTE增加随机森林抽取分数和xgboost，但精度分数(PS)略有下降。完全地，在Tomek delinker和SMOTE的混合解决方案中，随机森林在PR-AUC中给予了公平的关注和RS。即使使用了相同的重新采样技术，XGboost也无法提高PS。对于今后的研究，只要存在费用错误分类，就可以采用费用精细研究方法。因此，在未来的研究中，考虑这种行为变化是非常必要的，建立预测模型也是非常重要的。除此之外，还需要更大的数据，以便更好地进行CC诈骗检测中处理非平稳特性的详细研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MACHINE LEARNING TECHNIQUE FOR CREDIT CARD SCAM DETECTION

Credit Card (CC) scam In financial markets is a growing nuisance. CC scams increasing rapidly and causing large amounts of financial losses for organizations, governments, and public institutions, especially now that all payment methods for e-commerce shopping can be done much more easily through digital payment methods. For this reason, the purpose of this study is to detect scam CC transactions from a given dataset by performing a predictive investigation on the CC transaction dataset using machine learning techniques. The method used is a predictive model approach, namely logistic regression models (LR-M), random forests (RF), and XGBoost combined along particular resampling techniques that have been practiced to anticipate scams and the authenticity of CC transactions. Model performance was calculated grounded Re-call Curve (RC), precision, f1-score, PR, and ROC. The experimental results show that the random forest in combination with the hybrid resampling approach of SMOTE and removal of Tomek Links works better than other models. The random forest model and XGBoost accomplished are preferred over the LR-M as long as their global f1 score is without re-sampling. This demonstrates the strength of one technique that can provide greater achievement alike in the existence of class inequality dilemmas. Each approach, at the same time when used with Ran-Under, will give a great memory score but fails cursedly in the language of accuracy. Compared to the coordinate model sine re-sampling, the accuracy and RS are not repaired in cases where Tomek linker displacement was used. RF and xgboost perform quite well in terms of f1-S when Ran-Over is used. SMOTE increases the random forest draw score and xgboost but the precision score (PS) decreases slightly. Completely, during a hybrid solution of Tomek delinker and SMOTE was practiced with random forest, it gave equitable attention and RS in the PR-AUC. XGboost failed to increase the PS even though the same re-sampling technique was used. For future research, a fee-delicate study method can be applied as long as fee misclassifications. So for future research, it is very necessary to consider this behavior change and it is also very important to develop predictive models. In addition to this, much larger data is needed so that detailed studies on handling non-stationary properties in CC scam detection can be carried out better.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Technology Informatics and Engineering

自引率

0.00%

发文量