Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models

arXiv - CS - Emerging Technologies Pub Date : 2024-06-24 DOI:arxiv-2406.16244

Majd Soud, Waltteri Nuutinen, Grischa Liebel

{"title":"Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models","authors":"Majd Soud, Waltteri Nuutinen, Grischa Liebel","doi":"arxiv-2406.16244","DOIUrl":null,"url":null,"abstract":"Modern blockchain, such as Ethereum, supports the deployment and execution of\nso-called smart contracts, autonomous digital programs with significant value\nof cryptocurrency. Executing smart contracts requires gas costs paid by users,\nwhich define the limits of the contract's execution. Logic vulnerabilities in\nsmart contracts can lead to financial losses, and are often the root cause of\nhigh-impact cyberattacks. Our objective is threefold: (i) empirically\ninvestigate logic vulnerabilities in real-world smart contracts extracted from\ncode changes on GitHub, (ii) introduce Soley, an automated method for detecting\nlogic vulnerabilities in smart contracts, leveraging Large Language Models\n(LLMs), and (iii) examine mitigation strategies employed by smart contract\ndevelopers to address these vulnerabilities in real-world scenarios. We\nobtained smart contracts and related code changes from GitHub. To address the\nfirst and third objectives, we qualitatively investigated available logic\nvulnerabilities using an open coding method. We identified these\nvulnerabilities and their mitigation strategies. For the second objective, we\nextracted various logic vulnerabilities, applied preprocessing techniques, and\nimplemented and trained the proposed Soley model. We evaluated Soley along with\nthe performance of various LLMs and compared the results with the\nstate-of-the-art baseline on the task of logic vulnerability detection. From\nour analysis, we identified nine novel logic vulnerabilities, extending\nexisting taxonomies with these vulnerabilities. Furthermore, we introduced\nseveral mitigation strategies extracted from observed developer modifications\nin real-world scenarios. Our Soley method outperforms existing methods in\nautomatically identifying logic vulnerabilities. Interestingly, the efficacy of\nLLMs in this task was evident without requiring extensive feature engineering.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.16244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern blockchain, such as Ethereum, supports the deployment and execution of so-called smart contracts, autonomous digital programs with significant value of cryptocurrency. Executing smart contracts requires gas costs paid by users, which define the limits of the contract's execution. Logic vulnerabilities in smart contracts can lead to financial losses, and are often the root cause of high-impact cyberattacks. Our objective is threefold: (i) empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub, (ii) introduce Soley, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models (LLMs), and (iii) examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios. We obtained smart contracts and related code changes from GitHub. To address the first and third objectives, we qualitatively investigated available logic vulnerabilities using an open coding method. We identified these vulnerabilities and their mitigation strategies. For the second objective, we extracted various logic vulnerabilities, applied preprocessing techniques, and implemented and trained the proposed Soley model. We evaluated Soley along with the performance of various LLMs and compared the results with the state-of-the-art baseline on the task of logic vulnerability detection. From our analysis, we identified nine novel logic vulnerabilities, extending existing taxonomies with these vulnerabilities. Furthermore, we introduced several mitigation strategies extracted from observed developer modifications in real-world scenarios. Our Soley method outperforms existing methods in automatically identifying logic vulnerabilities. Interestingly, the efficacy of LLMs in this task was evident without requiring extensive feature engineering.

查看原文本刊更多论文

Soley：使用大型语言模型识别和自动检测以太坊智能合约中的逻辑漏洞

以太坊等现代区块链支持部署和执行所谓的智能合约，即具有重要加密货币价值的自主数字程序。执行智能合约需要用户支付气体成本，这些成本定义了合约执行的限制。智能合约中的逻辑漏洞会导致经济损失，而且往往是造成重大影响的网络攻击的根本原因。我们的目标有三个方面：(i) 从 GitHub 上的代码变更中提取经验，调查真实世界智能合约中的逻辑漏洞；(ii) 引入 Soley，这是一种利用大型语言模型（LLMs）检测智能合约中逻辑漏洞的自动化方法；(iii) 研究智能合约开发者采用的缓解策略，以解决真实世界场景中的这些漏洞。我们从 GitHub 获取了智能合约和相关代码变更。为了实现第一个和第三个目标，我们使用开放式编码方法对可用的逻辑漏洞进行了定性调查。我们确定了这些漏洞及其缓解策略。针对第二个目标，我们提取了各种逻辑漏洞，应用了预处理技术，并实施和训练了所提出的 Soley 模型。我们评估了 Soley 以及各种 LLM 的性能，并将结果与逻辑漏洞检测任务的最新基准进行了比较。通过分析，我们发现了九个新的逻辑漏洞，并用这些漏洞扩展了现有的分类标准。此外，我们还引入了几种从实际场景中观察到的开发人员修改中提取的缓解策略。在自动识别逻辑漏洞方面，我们的 Soley 方法优于现有方法。有趣的是，LLMs 在这项任务中的功效是显而易见的，而不需要大量的特征工程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Emerging Technologies

自引率

0.00%

发文量