DeepGRASS: Graph, Sequence and Scaled Embeddings on large scale transactions data

2021 Swedish Workshop on Data Science (SweDS) Pub Date : 2021-12-02 DOI:10.1109/SweDS53855.2021.9638270

Mahesh Balan Umaithanu, Vignesh Ravichandran, M. Rohith Srinivaas, Venkat Subramanian Selvaraj

{"title":"DeepGRASS: Graph, Sequence and Scaled Embeddings on large scale transactions data","authors":"Mahesh Balan Umaithanu, Vignesh Ravichandran, M. Rohith Srinivaas, Venkat Subramanian Selvaraj","doi":"10.1109/SweDS53855.2021.9638270","DOIUrl":null,"url":null,"abstract":"Representation learning has redefined large scale data mining applications. The high dimensional embeddings learn complex associations that transcend the human cognitive understanding and have achieved great success in different business applications that encounter the curse of dimensionality, including fin-tech. Different algorithms learn embeddings that capture different types of associations, and it would be useful to learn embeddings that holistically learn multi-dimensional associations. In this paper, we propose DeepGRASS – an algorithm that embeds financial transactions using graph and sequence-based topologies. Our results show that these embeddings learn associations that are very comprehensive, holistic, and multi-dimensional.We deploy DeepGRASS in PayPal, and train it on multitude of transaction data with multi-dimensional features. The algorithm is two-fold: it embeds a bipartite graph with customer and merchant nodes and parallelly learns sequential associations using historical transactions along with other transactional features. These embeddings are then scaled and combined to learn multidimensional associations. We tested this on different predictive applications and find that the learning is generic and shows benchmarking performance in different predictive contexts. Based on offline metrics, back-tests, and sensitivity analysis on offline transaction data, we find very strong evidence to suggest that these embeddings provide the highest AUC score in predictive applications, highest co-efficient of determination in explaining variance and the features explain different types of associations. To our knowledge, this is the first application of embeddings that learn both graph and sequence-based associations on large scale financial transaction data and paves the way for a new generation of feature engineering in fin-tech.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Swedish Workshop on Data Science (SweDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SweDS53855.2021.9638270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Representation learning has redefined large scale data mining applications. The high dimensional embeddings learn complex associations that transcend the human cognitive understanding and have achieved great success in different business applications that encounter the curse of dimensionality, including fin-tech. Different algorithms learn embeddings that capture different types of associations, and it would be useful to learn embeddings that holistically learn multi-dimensional associations. In this paper, we propose DeepGRASS – an algorithm that embeds financial transactions using graph and sequence-based topologies. Our results show that these embeddings learn associations that are very comprehensive, holistic, and multi-dimensional.We deploy DeepGRASS in PayPal, and train it on multitude of transaction data with multi-dimensional features. The algorithm is two-fold: it embeds a bipartite graph with customer and merchant nodes and parallelly learns sequential associations using historical transactions along with other transactional features. These embeddings are then scaled and combined to learn multidimensional associations. We tested this on different predictive applications and find that the learning is generic and shows benchmarking performance in different predictive contexts. Based on offline metrics, back-tests, and sensitivity analysis on offline transaction data, we find very strong evidence to suggest that these embeddings provide the highest AUC score in predictive applications, highest co-efficient of determination in explaining variance and the features explain different types of associations. To our knowledge, this is the first application of embeddings that learn both graph and sequence-based associations on large scale financial transaction data and paves the way for a new generation of feature engineering in fin-tech.

查看原文本刊更多论文

DeepGRASS:大规模交易数据的图，序列和缩放嵌入

表示学习重新定义了大规模数据挖掘应用。高维嵌入学习超越人类认知理解的复杂关联，并在遇到维度诅咒的不同商业应用中取得了巨大成功，包括金融科技。不同的算法学习捕获不同类型关联的嵌入，学习整体学习多维关联的嵌入将是有用的。在本文中，我们提出了DeepGRASS——一种使用基于图和序列的拓扑结构嵌入金融交易的算法。我们的研究结果表明，这些嵌入学习的关联是非常全面、整体和多维的。我们在PayPal中部署了DeepGRASS，并对具有多维特征的大量交易数据进行了训练。该算法是双重的:它嵌入了一个带有客户和商家节点的二部图，并使用历史交易和其他交易特征并行地学习顺序关联。然后对这些嵌入进行缩放和组合以学习多维关联。我们在不同的预测应用程序上进行了测试，发现学习是通用的，并且在不同的预测环境中显示了基准性能。基于离线度量、回测和对离线交易数据的敏感性分析，我们发现非常有力的证据表明，这些嵌入在预测应用中提供了最高的AUC得分，在解释方差和解释不同类型关联的特征方面提供了最高的确定系数。据我们所知，这是嵌入在大规模金融交易数据上学习基于图和序列的关联的首次应用，并为金融科技领域新一代特征工程铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Swedish Workshop on Data Science (SweDS)

自引率

0.00%

发文量