Malo in the Code Jungle: Explainable Fault Localization for Decentralized Applications

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-06-13 DOI:10.1109/TSE.2025.3578816

Hui Zhang;Jiajing Wu;Zhiying Wu;Zhe Chen;Dan Lin;Jiachi Chen;Yuren Zhou;Zibin Zheng

{"title":"Malo in the Code Jungle: Explainable Fault Localization for Decentralized Applications","authors":"Hui Zhang;Jiajing Wu;Zhiying Wu;Zhe Chen;Dan Lin;Jiachi Chen;Yuren Zhou;Zibin Zheng","doi":"10.1109/TSE.2025.3578816","DOIUrl":null,"url":null,"abstract":"Decentralized applications (DApps) have long been sitting ducks for hackers due to their valuable cryptocurrency assets, exposing them to various security risks. When a DApp is attacked, promptly identifying faults is crucial to minimizing financial losses and ensuring effective fault repair. However, existing fault localization methods, which mostly rely on code coverage, often fall short for DApps, particularly when dealing with only one fault case. Furthermore, according to a prior survey, most developers expect fault localization tools to provide reasonable explanations. In this paper, we present Malo, a <underline>method for DApp-specific expl<underline>ainable fault <underline>localization. It identifies fault functions through <italic>suspicious token transfer-guided analysis, and then employs Large Language Models (LLMs) to generate explanations for these identified fault functions. Specifically, Malo examines function call traces and source codes of fault cases to acquire <italic>internal knowledge, and also retrieves relevant project documents from the Web to obtain <italic>external knowledge. By integrating internal and external knowledge, Malo generates reasonable explanations for faults in DApps. Our evaluation on a dataset of 68 real-world DApp faults demonstrates that Malo can locate 62% of faults within the Top-5, 9% higher than the state-of-the-art method. The experiment results also demonstrate a remarkable alignment accuracy of 71% between the explanations generated by Malo and the ground truth. In addition, we conduct a user study, which confirms that explanations generated by Malo can aid developers in comprehending the root cause of faults. Our code and dataset are available online: <uri>https://github.com/SodalimeZero/Malo_Code.git</uri>.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2197-2210"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11034691/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Decentralized applications (DApps) have long been sitting ducks for hackers due to their valuable cryptocurrency assets, exposing them to various security risks. When a DApp is attacked, promptly identifying faults is crucial to minimizing financial losses and ensuring effective fault repair. However, existing fault localization methods, which mostly rely on code coverage, often fall short for DApps, particularly when dealing with only one fault case. Furthermore, according to a prior survey, most developers expect fault localization tools to provide reasonable explanations. In this paper, we present Malo, a method for DApp-specific explainable fault localization. It identifies fault functions through suspicious token transfer-guided analysis, and then employs Large Language Models (LLMs) to generate explanations for these identified fault functions. Specifically, Malo examines function call traces and source codes of fault cases to acquire internal knowledge, and also retrieves relevant project documents from the Web to obtain external knowledge. By integrating internal and external knowledge, Malo generates reasonable explanations for faults in DApps. Our evaluation on a dataset of 68 real-world DApp faults demonstrates that Malo can locate 62% of faults within the Top-5, 9% higher than the state-of-the-art method. The experiment results also demonstrate a remarkable alignment accuracy of 71% between the explanations generated by Malo and the ground truth. In addition, we conduct a user study, which confirms that explanations generated by Malo can aid developers in comprehending the root cause of faults. Our code and dataset are available online: https://github.com/SodalimeZero/Malo_Code.git.

查看原文本刊更多论文

代码丛林中的Malo：分布式应用程序的可解释故障定位

由于其宝贵的加密货币资产，去中心化应用程序（DApps）长期以来一直是黑客的靶子，使其面临各种安全风险。当DApp受到攻击时，及时发现故障是将经济损失降到最低、有效修复故障的关键。然而，现有的故障定位方法主要依赖于代码覆盖率，对于dapp来说往往不足，特别是在只处理一种故障情况时。此外，根据先前的调查，大多数开发人员期望故障定位工具能够提供合理的解释。在本文中，我们提出了Malo，一种针对dapp的可解释故障定位方法。它通过可疑的令牌转移引导分析识别故障函数，然后使用大型语言模型（llm）为这些识别的故障函数生成解释。具体来说，Malo通过检查函数调用轨迹和故障案例的源代码来获取内部知识，并从Web检索相关的项目文档来获取外部知识。Malo通过整合内部和外部知识，对dapp中的故障做出合理的解释。我们对68个真实DApp故障数据集的评估表明，Malo可以在Top-5中定位62%的故障，比最先进的方法高出9%。实验结果还表明，Malo生成的解释与地面真实之间的对齐精度达到了71%。此外，我们进行了一项用户研究，证实了Malo生成的解释可以帮助开发人员理解故障的根本原因。我们的代码和数据集可在线获取：https://github.com/SodalimeZero/Malo_Code.git。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.