{"title":"Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models","authors":"Majd Soud, Waltteri Nuutinen, Grischa Liebel","doi":"arxiv-2406.16244","DOIUrl":null,"url":null,"abstract":"Modern blockchain, such as Ethereum, supports the deployment and execution of\nso-called smart contracts, autonomous digital programs with significant value\nof cryptocurrency. Executing smart contracts requires gas costs paid by users,\nwhich define the limits of the contract's execution. Logic vulnerabilities in\nsmart contracts can lead to financial losses, and are often the root cause of\nhigh-impact cyberattacks. Our objective is threefold: (i) empirically\ninvestigate logic vulnerabilities in real-world smart contracts extracted from\ncode changes on GitHub, (ii) introduce Soley, an automated method for detecting\nlogic vulnerabilities in smart contracts, leveraging Large Language Models\n(LLMs), and (iii) examine mitigation strategies employed by smart contract\ndevelopers to address these vulnerabilities in real-world scenarios. We\nobtained smart contracts and related code changes from GitHub. To address the\nfirst and third objectives, we qualitatively investigated available logic\nvulnerabilities using an open coding method. We identified these\nvulnerabilities and their mitigation strategies. For the second objective, we\nextracted various logic vulnerabilities, applied preprocessing techniques, and\nimplemented and trained the proposed Soley model. We evaluated Soley along with\nthe performance of various LLMs and compared the results with the\nstate-of-the-art baseline on the task of logic vulnerability detection. From\nour analysis, we identified nine novel logic vulnerabilities, extending\nexisting taxonomies with these vulnerabilities. Furthermore, we introduced\nseveral mitigation strategies extracted from observed developer modifications\nin real-world scenarios. Our Soley method outperforms existing methods in\nautomatically identifying logic vulnerabilities. Interestingly, the efficacy of\nLLMs in this task was evident without requiring extensive feature engineering.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.16244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern blockchain, such as Ethereum, supports the deployment and execution of
so-called smart contracts, autonomous digital programs with significant value
of cryptocurrency. Executing smart contracts requires gas costs paid by users,
which define the limits of the contract's execution. Logic vulnerabilities in
smart contracts can lead to financial losses, and are often the root cause of
high-impact cyberattacks. Our objective is threefold: (i) empirically
investigate logic vulnerabilities in real-world smart contracts extracted from
code changes on GitHub, (ii) introduce Soley, an automated method for detecting
logic vulnerabilities in smart contracts, leveraging Large Language Models
(LLMs), and (iii) examine mitigation strategies employed by smart contract
developers to address these vulnerabilities in real-world scenarios. We
obtained smart contracts and related code changes from GitHub. To address the
first and third objectives, we qualitatively investigated available logic
vulnerabilities using an open coding method. We identified these
vulnerabilities and their mitigation strategies. For the second objective, we
extracted various logic vulnerabilities, applied preprocessing techniques, and
implemented and trained the proposed Soley model. We evaluated Soley along with
the performance of various LLMs and compared the results with the
state-of-the-art baseline on the task of logic vulnerability detection. From
our analysis, we identified nine novel logic vulnerabilities, extending
existing taxonomies with these vulnerabilities. Furthermore, we introduced
several mitigation strategies extracted from observed developer modifications
in real-world scenarios. Our Soley method outperforms existing methods in
automatically identifying logic vulnerabilities. Interestingly, the efficacy of
LLMs in this task was evident without requiring extensive feature engineering.