用图转换语言模型检测DeFi欺诈

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-09-19 DOI:10.1109/TIFS.2025.3612184

Wei Ma;Junjie Shi;Jiaxi Qiu;Cong Wu;Jing Chen;Lingxiao Jiang;Shangqing Liu;Yang Liu;Yang Xiang

{"title":"用图转换语言模型检测DeFi欺诈","authors":"Wei Ma;Junjie Shi;Jiaxi Qiu;Cong Wu;Jing Chen;Lingxiao Jiang;Shangqing Liu;Yang Liu;Yang Xiang","doi":"10.1109/TIFS.2025.3612184","DOIUrl":null,"url":null,"abstract":"With the rapid development of blockchain technology, the widespread adoption of smart contracts—particularly in decentralized finance (DeFi) applications—has introduced significant security challenges, such as reentrancy attacks, phishing, and Sybil attacks. To address these issues, we propose a novel model called TrxGNNBERT, which combines Graph Neural Network (GNN) and the Transformer architecture to effectively handle both graph-structured and textual data. This combination enhances the detection of suspicious transactions and accounts on blockchain platforms like Ethereum. TrxGNNBERT was pre-trained using a masked language model (MLM) on a dataset of 60,000 Ethereum transactions by randomly masking the attributes of nodes and edges, thereby capturing deep semantic relationships and structural information. In this work, we constructed transaction subgraphs, using a GNN module to enrich the embedding representations, which were then fed into a Transformer encoder. The experimental results demonstrate that TrxGNNBERT outperforms various baseline models—including DeepWalk, Trans2Vec, Role2Vec, GCN, GAT, GraphSAGE, CodeBERT, GraphCodeBERT, Zipzap and BERT4ETH—in detecting suspicious transactions and accounts. Specifically, TrxGNNBERT achieved an accuracy of 0.755 and an F1 score of 0.756 on the TrxLarge dataset; an accuracy of 0.903 and an F1 score of 0.894 on the TrxSmall dataset; and an accuracy of 0.790 and an F1 score of 0.781 on the AddrDec dataset. We also explored different pre-training configurations and strategies, comparing the performance of encoder-based versus decoder-based Transformer structures. The results indicate that pre-training improves downstream task performance, with encoder-based structures outperforming decoder-based ones. Through ablation studies, we found that node-level information and subgraph structures are critical for achieving optimal performance in transaction classification tasks. When key features were removed, the model performance declined considerably, demonstrating the importance of each component of our method. These findings offer valuable insights for future research, suggesting further improvements in node attribute representation and subgraph extraction.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"10051-10065"},"PeriodicalIF":8.0000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detecting DeFi Fraud With a Graph-Transformer Language Model\",\"authors\":\"Wei Ma;Junjie Shi;Jiaxi Qiu;Cong Wu;Jing Chen;Lingxiao Jiang;Shangqing Liu;Yang Liu;Yang Xiang\",\"doi\":\"10.1109/TIFS.2025.3612184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of blockchain technology, the widespread adoption of smart contracts—particularly in decentralized finance (DeFi) applications—has introduced significant security challenges, such as reentrancy attacks, phishing, and Sybil attacks. To address these issues, we propose a novel model called TrxGNNBERT, which combines Graph Neural Network (GNN) and the Transformer architecture to effectively handle both graph-structured and textual data. This combination enhances the detection of suspicious transactions and accounts on blockchain platforms like Ethereum. TrxGNNBERT was pre-trained using a masked language model (MLM) on a dataset of 60,000 Ethereum transactions by randomly masking the attributes of nodes and edges, thereby capturing deep semantic relationships and structural information. In this work, we constructed transaction subgraphs, using a GNN module to enrich the embedding representations, which were then fed into a Transformer encoder. The experimental results demonstrate that TrxGNNBERT outperforms various baseline models—including DeepWalk, Trans2Vec, Role2Vec, GCN, GAT, GraphSAGE, CodeBERT, GraphCodeBERT, Zipzap and BERT4ETH—in detecting suspicious transactions and accounts. Specifically, TrxGNNBERT achieved an accuracy of 0.755 and an F1 score of 0.756 on the TrxLarge dataset; an accuracy of 0.903 and an F1 score of 0.894 on the TrxSmall dataset; and an accuracy of 0.790 and an F1 score of 0.781 on the AddrDec dataset. We also explored different pre-training configurations and strategies, comparing the performance of encoder-based versus decoder-based Transformer structures. The results indicate that pre-training improves downstream task performance, with encoder-based structures outperforming decoder-based ones. Through ablation studies, we found that node-level information and subgraph structures are critical for achieving optimal performance in transaction classification tasks. When key features were removed, the model performance declined considerably, demonstrating the importance of each component of our method. These findings offer valuable insights for future research, suggesting further improvements in node attribute representation and subgraph extraction.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"10051-10065\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11173704/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11173704/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

随着区块链技术的快速发展，智能合约的广泛采用——特别是在去中心化金融（DeFi）应用程序中——带来了重大的安全挑战，例如重入攻击、网络钓鱼和Sybil攻击。为了解决这些问题，我们提出了一个名为TrxGNNBERT的新模型，该模型结合了图神经网络（GNN）和Transformer架构来有效地处理图结构和文本数据。这种组合增强了对区块链平台（如以太坊）上可疑交易和账户的检测。TrxGNNBERT使用掩码语言模型（MLM）在60,000个以太坊交易数据集上进行预训练，随机屏蔽节点和边的属性，从而捕获深层语义关系和结构信息。在这项工作中，我们构建了事务子图，使用GNN模块来丰富嵌入表示，然后将其馈送到Transformer编码器中。实验结果表明，TrxGNNBERT在检测可疑交易和账户方面优于各种基线模型，包括DeepWalk、Trans2Vec、Role2Vec、GCN、GAT、GraphSAGE、CodeBERT、GraphCodeBERT、Zipzap和bert4eth。具体来说，TrxGNNBERT在TrxLarge数据集上的准确率为0.755，F1得分为0.756；在TrxSmall数据集上的准确率为0.903，F1得分为0.894；在AddrDec数据集上的精度为0.790，F1分数为0.781。我们还探讨了不同的预训练配置和策略，比较了基于编码器和基于解码器的Transformer结构的性能。结果表明，预训练提高了下游任务的性能，基于编码器的结构优于基于解码器的结构。通过研究，我们发现节点级信息和子图结构对于实现事务分类任务的最佳性能至关重要。当关键特征被删除时，模型性能显著下降，表明我们的方法的每个组成部分的重要性。这些发现为未来的研究提供了有价值的见解，建议进一步改进节点属性表示和子图提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting DeFi Fraud With a Graph-Transformer Language Model

With the rapid development of blockchain technology, the widespread adoption of smart contracts—particularly in decentralized finance (DeFi) applications—has introduced significant security challenges, such as reentrancy attacks, phishing, and Sybil attacks. To address these issues, we propose a novel model called TrxGNNBERT, which combines Graph Neural Network (GNN) and the Transformer architecture to effectively handle both graph-structured and textual data. This combination enhances the detection of suspicious transactions and accounts on blockchain platforms like Ethereum. TrxGNNBERT was pre-trained using a masked language model (MLM) on a dataset of 60,000 Ethereum transactions by randomly masking the attributes of nodes and edges, thereby capturing deep semantic relationships and structural information. In this work, we constructed transaction subgraphs, using a GNN module to enrich the embedding representations, which were then fed into a Transformer encoder. The experimental results demonstrate that TrxGNNBERT outperforms various baseline models—including DeepWalk, Trans2Vec, Role2Vec, GCN, GAT, GraphSAGE, CodeBERT, GraphCodeBERT, Zipzap and BERT4ETH—in detecting suspicious transactions and accounts. Specifically, TrxGNNBERT achieved an accuracy of 0.755 and an F1 score of 0.756 on the TrxLarge dataset; an accuracy of 0.903 and an F1 score of 0.894 on the TrxSmall dataset; and an accuracy of 0.790 and an F1 score of 0.781 on the AddrDec dataset. We also explored different pre-training configurations and strategies, comparing the performance of encoder-based versus decoder-based Transformer structures. The results indicate that pre-training improves downstream task performance, with encoder-based structures outperforming decoder-based ones. Through ablation studies, we found that node-level information and subgraph structures are critical for achieving optimal performance in transaction classification tasks. When key features were removed, the model performance declined considerably, demonstrating the importance of each component of our method. These findings offer valuable insights for future research, suggesting further improvements in node attribute representation and subgraph extraction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features