{"title":"DeepGRASS: Graph, Sequence and Scaled Embeddings on large scale transactions data","authors":"Mahesh Balan Umaithanu, Vignesh Ravichandran, M. Rohith Srinivaas, Venkat Subramanian Selvaraj","doi":"10.1109/SweDS53855.2021.9638270","DOIUrl":null,"url":null,"abstract":"Representation learning has redefined large scale data mining applications. The high dimensional embeddings learn complex associations that transcend the human cognitive understanding and have achieved great success in different business applications that encounter the curse of dimensionality, including fin-tech. Different algorithms learn embeddings that capture different types of associations, and it would be useful to learn embeddings that holistically learn multi-dimensional associations. In this paper, we propose DeepGRASS – an algorithm that embeds financial transactions using graph and sequence-based topologies. Our results show that these embeddings learn associations that are very comprehensive, holistic, and multi-dimensional.We deploy DeepGRASS in PayPal, and train it on multitude of transaction data with multi-dimensional features. The algorithm is two-fold: it embeds a bipartite graph with customer and merchant nodes and parallelly learns sequential associations using historical transactions along with other transactional features. These embeddings are then scaled and combined to learn multidimensional associations. We tested this on different predictive applications and find that the learning is generic and shows benchmarking performance in different predictive contexts. Based on offline metrics, back-tests, and sensitivity analysis on offline transaction data, we find very strong evidence to suggest that these embeddings provide the highest AUC score in predictive applications, highest co-efficient of determination in explaining variance and the features explain different types of associations. To our knowledge, this is the first application of embeddings that learn both graph and sequence-based associations on large scale financial transaction data and paves the way for a new generation of feature engineering in fin-tech.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Swedish Workshop on Data Science (SweDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SweDS53855.2021.9638270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Representation learning has redefined large scale data mining applications. The high dimensional embeddings learn complex associations that transcend the human cognitive understanding and have achieved great success in different business applications that encounter the curse of dimensionality, including fin-tech. Different algorithms learn embeddings that capture different types of associations, and it would be useful to learn embeddings that holistically learn multi-dimensional associations. In this paper, we propose DeepGRASS – an algorithm that embeds financial transactions using graph and sequence-based topologies. Our results show that these embeddings learn associations that are very comprehensive, holistic, and multi-dimensional.We deploy DeepGRASS in PayPal, and train it on multitude of transaction data with multi-dimensional features. The algorithm is two-fold: it embeds a bipartite graph with customer and merchant nodes and parallelly learns sequential associations using historical transactions along with other transactional features. These embeddings are then scaled and combined to learn multidimensional associations. We tested this on different predictive applications and find that the learning is generic and shows benchmarking performance in different predictive contexts. Based on offline metrics, back-tests, and sensitivity analysis on offline transaction data, we find very strong evidence to suggest that these embeddings provide the highest AUC score in predictive applications, highest co-efficient of determination in explaining variance and the features explain different types of associations. To our knowledge, this is the first application of embeddings that learn both graph and sequence-based associations on large scale financial transaction data and paves the way for a new generation of feature engineering in fin-tech.