The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset

Claudio Bellei, Muhua Xu, Ross Phillips, Tom Robinson, Mark Weber, Tim Kaler, Charles E. Leiserson, Arvind, Jie Chen
{"title":"The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset","authors":"Claudio Bellei, Muhua Xu, Ross Phillips, Tom Robinson, Mark Weber, Tim Kaler, Charles E. Leiserson, Arvind, Jie Chen","doi":"arxiv-2404.19109","DOIUrl":null,"url":null,"abstract":"Subgraph representation learning is a technique for analyzing local\nstructures (or shapes) within complex networks. Enabled by recent developments\nin scalable Graph Neural Networks (GNNs), this approach encodes relational\ninformation at a subgroup level (multiple connected nodes) rather than at a\nnode level of abstraction. We posit that certain domain applications, such as\nanti-money laundering (AML), are inherently subgraph problems and mainstream\ngraph techniques have been operating at a suboptimal level of abstraction. This\nis due in part to the scarcity of annotated datasets of real-world size and\ncomplexity, as well as the lack of software tools for managing subgraph GNN\nworkflows at scale. To enable work in fundamental algorithms as well as domain\napplications in AML and beyond, we introduce Elliptic2, a large graph dataset\ncontaining 122K labeled subgraphs of Bitcoin clusters within a background graph\nconsisting of 49M node clusters and 196M edge transactions. The dataset\nprovides subgraphs known to be linked to illicit activity for learning the set\nof \"shapes\" that money laundering exhibits in cryptocurrency and accurately\nclassifying new criminal activity. Along with the dataset we share our graph\ntechniques, software tooling, promising early experimental results, and new\ndomain insights already gleaned from this approach. Taken together, we find\nimmediate practical value in this approach and the potential for a new standard\nin anti-money laundering and forensic analytics in cryptocurrencies and other\nfinancial networks.","PeriodicalId":501372,"journal":{"name":"arXiv - QuantFin - General Finance","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - General Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.19109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Subgraph representation learning is a technique for analyzing local structures (or shapes) within complex networks. Enabled by recent developments in scalable Graph Neural Networks (GNNs), this approach encodes relational information at a subgroup level (multiple connected nodes) rather than at a node level of abstraction. We posit that certain domain applications, such as anti-money laundering (AML), are inherently subgraph problems and mainstream graph techniques have been operating at a suboptimal level of abstraction. This is due in part to the scarcity of annotated datasets of real-world size and complexity, as well as the lack of software tools for managing subgraph GNN workflows at scale. To enable work in fundamental algorithms as well as domain applications in AML and beyond, we introduce Elliptic2, a large graph dataset containing 122K labeled subgraphs of Bitcoin clusters within a background graph consisting of 49M node clusters and 196M edge transactions. The dataset provides subgraphs known to be linked to illicit activity for learning the set of "shapes" that money laundering exhibits in cryptocurrency and accurately classifying new criminal activity. Along with the dataset we share our graph techniques, software tooling, promising early experimental results, and new domain insights already gleaned from this approach. Taken together, we find immediate practical value in this approach and the potential for a new standard in anti-money laundering and forensic analytics in cryptocurrencies and other financial networks.
洗钱的形状:利用 Elliptic2 数据集在区块链上进行子图表示学习
子图表示学习是一种分析复杂网络中局部结构(或形状)的技术。在可扩展图神经网络(GNNs)最新发展的推动下,这种方法在子组级(多个连接节点)而非抽象节点级对关系信息进行编码。我们认为,反洗钱(AML)等某些领域的应用本质上属于子图问题,而主流图技术一直在次优抽象层次上运行。部分原因在于现实世界中规模和复杂性的注释数据集稀缺,以及缺乏大规模管理子图 GNN 工作流的软件工具。为了实现基础算法以及反洗钱等领域应用的工作,我们引入了Elliptic2,这是一个大型图数据集,在一个由4,900万个节点集群和1.96亿条边交易组成的背景图中,包含了12.2万个比特币集群的标注子图。该数据集提供了已知与非法活动相关联的子图,用于学习加密货币中洗钱活动的 "形状 "集,并对新的犯罪活动进行准确分类。除了数据集,我们还分享了我们的图形技术、软件工具、有前景的早期实验结果,以及从这种方法中已经收集到的新领域见解。综上所述,我们发现这种方法具有直接的实用价值,并有可能成为加密货币和其他金融网络中反洗钱和法证分析的新标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信