多种行为模型:分而治之的金融数据流欺诈检测策略

2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) Pub Date : 2015-11-12 DOI:10.5220/0005637104960503

Roberto Saia, Ludovico Boratto, S. Carta

{"title":"多种行为模型:分而治之的金融数据流欺诈检测策略","authors":"Roberto Saia, Ludovico Boratto, S. Carta","doi":"10.5220/0005637104960503","DOIUrl":null,"url":null,"abstract":"The exponential and rapid growth of the E-commerce based both on the new opportunities offered by the Internet, and on the spread of the use of debit or credit cards in the online purchases, has strongly increased the number of frauds, causing large economic losses to the involved businesses. The design of effective strategies able to face this problem is however particularly challenging, due to several factors, such as the heterogeneity and the non-stationary distribution of the data stream, as well as the presence of an imbalanced class distribution. To complicate the problem, there is the scarcity of public datasets for confidentiality issues, which does not allow researchers to verify the new strategies in many data contexts. Differently from the canonical state-of-the-art strategies, instead of defining a unique model based on the past transactions of the users, we follow a Divide and Conquer strategy, by defining multiple models (user behavioral patterns), which we exploit to evaluate a new transaction, in order to detect potential attempts of fraud. We can act on some parameters of this process, in order to adapt the models sensitivity to the operating environment. Considering that our models do not need to be trained with both the past legitimate and fraudulent transactions of a user, since they use only the legitimate ones, we can operate in a proactive manner, by detecting fraudulent transactions that have never occurred in the past. Such a way to proceed also overcomes the data imbalance problem that afflicts the machine learning approaches. The evaluation of the proposed approach is performed by comparing it with one of the most performant approaches at the state of the art as Random Forests, using a real-world credit card dataset.","PeriodicalId":102743,"journal":{"name":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Multiple behavioral models: A Divide and Conquer strategy to fraud detection in financial data streams\",\"authors\":\"Roberto Saia, Ludovico Boratto, S. Carta\",\"doi\":\"10.5220/0005637104960503\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The exponential and rapid growth of the E-commerce based both on the new opportunities offered by the Internet, and on the spread of the use of debit or credit cards in the online purchases, has strongly increased the number of frauds, causing large economic losses to the involved businesses. The design of effective strategies able to face this problem is however particularly challenging, due to several factors, such as the heterogeneity and the non-stationary distribution of the data stream, as well as the presence of an imbalanced class distribution. To complicate the problem, there is the scarcity of public datasets for confidentiality issues, which does not allow researchers to verify the new strategies in many data contexts. Differently from the canonical state-of-the-art strategies, instead of defining a unique model based on the past transactions of the users, we follow a Divide and Conquer strategy, by defining multiple models (user behavioral patterns), which we exploit to evaluate a new transaction, in order to detect potential attempts of fraud. We can act on some parameters of this process, in order to adapt the models sensitivity to the operating environment. Considering that our models do not need to be trained with both the past legitimate and fraudulent transactions of a user, since they use only the legitimate ones, we can operate in a proactive manner, by detecting fraudulent transactions that have never occurred in the past. Such a way to proceed also overcomes the data imbalance problem that afflicts the machine learning approaches. The evaluation of the proposed approach is performed by comparing it with one of the most performant approaches at the state of the art as Random Forests, using a real-world credit card dataset.\",\"PeriodicalId\":102743,\"journal\":{\"name\":\"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)\",\"volume\":\"182 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0005637104960503\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0005637104960503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

基于互联网提供的新机会以及在网上购物中使用借记卡或信用卡的普及，电子商务呈指数级和快速增长，这大大增加了欺诈的数量，给相关企业造成了巨大的经济损失。然而，由于几个因素，例如数据流的异质性和非平稳分布，以及类分布不平衡的存在，能够面对这个问题的有效策略的设计特别具有挑战性。使问题复杂化的是，由于保密问题，公共数据集的稀缺性，这使得研究人员无法在许多数据环境中验证新策略。与规范的最先进的策略不同，我们不是根据用户过去的交易定义一个独特的模型，而是遵循分而治之的策略，通过定义多个模型(用户行为模式)，我们利用这些模型来评估新的交易，以检测潜在的欺诈企图。我们可以对该过程的一些参数进行操作，以使模型对操作环境的敏感性。考虑到我们的模型不需要同时训练用户过去的合法交易和欺诈交易，因为它们只使用合法交易，我们可以通过检测过去从未发生过的欺诈交易来主动操作。这种方法也克服了困扰机器学习方法的数据不平衡问题。通过使用真实世界的信用卡数据集，将所提出的方法与目前最先进的随机森林方法之一进行比较，从而对所提出的方法进行评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiple behavioral models: A Divide and Conquer strategy to fraud detection in financial data streams

The exponential and rapid growth of the E-commerce based both on the new opportunities offered by the Internet, and on the spread of the use of debit or credit cards in the online purchases, has strongly increased the number of frauds, causing large economic losses to the involved businesses. The design of effective strategies able to face this problem is however particularly challenging, due to several factors, such as the heterogeneity and the non-stationary distribution of the data stream, as well as the presence of an imbalanced class distribution. To complicate the problem, there is the scarcity of public datasets for confidentiality issues, which does not allow researchers to verify the new strategies in many data contexts. Differently from the canonical state-of-the-art strategies, instead of defining a unique model based on the past transactions of the users, we follow a Divide and Conquer strategy, by defining multiple models (user behavioral patterns), which we exploit to evaluate a new transaction, in order to detect potential attempts of fraud. We can act on some parameters of this process, in order to adapt the models sensitivity to the operating environment. Considering that our models do not need to be trained with both the past legitimate and fraudulent transactions of a user, since they use only the legitimate ones, we can operate in a proactive manner, by detecting fraudulent transactions that have never occurred in the past. Such a way to proceed also overcomes the data imbalance problem that afflicts the machine learning approaches. The evaluation of the proposed approach is performed by comparing it with one of the most performant approaches at the state of the art as Random Forests, using a real-world credit card dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)

自引率

0.00%

发文量