{"title":"利用变换器和控制流图的双向编码器表示检测智能合约漏洞","authors":"Peng Su, Jingyuan Hu","doi":"10.1007/s00530-024-01406-9","DOIUrl":null,"url":null,"abstract":"<p>Up to now, the smart contract vulnerabilities detection methods based on sequence modal data and sequence models have been the most commonly used. However, existing state-of-the-art methods disregard the issue of sequence modal data loses structural information and control flow information. Additionally, it is hard for sequence models to extract global features of smart contracts. Moreover, these methods rarely consider the impact of noise data on vulnerabilities detection. To tackle these issues, we propose a smart contract vulnerabilities detection model based on bidirectional encoder representation from transformers (BERT) and control flow graph (CFG). On the one hand, we design a denoising method suitable for control flow graphs to reduce the impact of noisy data on vulnerabilities detection. On the other hand, we design a novel method to parse the control flow graph into a BERT input form that retains control flow information and structural information. The BERT learns the potential vulnerability characteristics of smart contracts to fine-tune itself. Through an empirical evaluation of a large-scale real-world dataset and compare 5 state-of-the-art baseline methods. Our method achieves (1) optimal performance over all baseline methods; (2) 0.6–17.1% higher F1-score than baseline methods; (3) 0.7–16.7% higher accuracy than baseline methods; (4) 0.6–17% higher precision than baseline methods; (5) 0.2–19.5% higher recall than baseline methods.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Smart contract vulnerabilities detection with bidirectional encoder representations from transformers and control flow graph\",\"authors\":\"Peng Su, Jingyuan Hu\",\"doi\":\"10.1007/s00530-024-01406-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Up to now, the smart contract vulnerabilities detection methods based on sequence modal data and sequence models have been the most commonly used. However, existing state-of-the-art methods disregard the issue of sequence modal data loses structural information and control flow information. Additionally, it is hard for sequence models to extract global features of smart contracts. Moreover, these methods rarely consider the impact of noise data on vulnerabilities detection. To tackle these issues, we propose a smart contract vulnerabilities detection model based on bidirectional encoder representation from transformers (BERT) and control flow graph (CFG). On the one hand, we design a denoising method suitable for control flow graphs to reduce the impact of noisy data on vulnerabilities detection. On the other hand, we design a novel method to parse the control flow graph into a BERT input form that retains control flow information and structural information. The BERT learns the potential vulnerability characteristics of smart contracts to fine-tune itself. Through an empirical evaluation of a large-scale real-world dataset and compare 5 state-of-the-art baseline methods. Our method achieves (1) optimal performance over all baseline methods; (2) 0.6–17.1% higher F1-score than baseline methods; (3) 0.7–16.7% higher accuracy than baseline methods; (4) 0.6–17% higher precision than baseline methods; (5) 0.2–19.5% higher recall than baseline methods.</p>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01406-9\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01406-9","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Smart contract vulnerabilities detection with bidirectional encoder representations from transformers and control flow graph
Up to now, the smart contract vulnerabilities detection methods based on sequence modal data and sequence models have been the most commonly used. However, existing state-of-the-art methods disregard the issue of sequence modal data loses structural information and control flow information. Additionally, it is hard for sequence models to extract global features of smart contracts. Moreover, these methods rarely consider the impact of noise data on vulnerabilities detection. To tackle these issues, we propose a smart contract vulnerabilities detection model based on bidirectional encoder representation from transformers (BERT) and control flow graph (CFG). On the one hand, we design a denoising method suitable for control flow graphs to reduce the impact of noisy data on vulnerabilities detection. On the other hand, we design a novel method to parse the control flow graph into a BERT input form that retains control flow information and structural information. The BERT learns the potential vulnerability characteristics of smart contracts to fine-tune itself. Through an empirical evaluation of a large-scale real-world dataset and compare 5 state-of-the-art baseline methods. Our method achieves (1) optimal performance over all baseline methods; (2) 0.6–17.1% higher F1-score than baseline methods; (3) 0.7–16.7% higher accuracy than baseline methods; (4) 0.6–17% higher precision than baseline methods; (5) 0.2–19.5% higher recall than baseline methods.