基于深度学习的输入类型对智能合约漏洞检测性能的影响：初步研究

Information Pub Date : 2024-05-24 DOI:10.3390/info15060302

Izdehar M. Aldyaflah, Wenbing Zhao, Shunkun Yang, Xiong Luo

{"title":"基于深度学习的输入类型对智能合约漏洞检测性能的影响：初步研究","authors":"Izdehar M. Aldyaflah, Wenbing Zhao, Shunkun Yang, Xiong Luo","doi":"10.3390/info15060302","DOIUrl":null,"url":null,"abstract":"Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification.","PeriodicalId":510156,"journal":{"name":"Information","volume":"7 40","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study\",\"authors\":\"Izdehar M. Aldyaflah, Wenbing Zhao, Shunkun Yang, Xiong Luo\",\"doi\":\"10.3390/info15060302\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification.\",\"PeriodicalId\":510156,\"journal\":{\"name\":\"Information\",\"volume\":\"7 40\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/info15060302\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info15060302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

要确保去中心化应用的安全性，必须在部署智能合约之前将漏洞扼杀在摇篮里。因此，人们提出了许多工具和基于机器学习的方法来帮助检测智能合约中的漏洞。此外，人们还提出了各种对智能合约进行编码的分析方法。然而，这些输入方法的影响尚未得到系统研究，而这正是本文的主要目标。在这项初步研究中，我们尝试了四种常见的输入类型，包括 Word2Vec、FastText、词包（BoW）和词频-反向文档频率（TF-IDF）。为了集中比较这些输入类型，我们在所有实验中使用了相同的深度学习模型，即卷积神经网络。我们使用公共数据集，比较了四种输入类型在二元分类场景和多分类场景下的漏洞检测性能。我们的研究结果表明，TF-IDF 是四种输入类型中整体性能最好的一种。TF-IDF 在所有场景下都具有出色的检测性能：(1) 在所有漏洞类型的二进制分类中，TF-IDF 的 F1 分数和准确率都是最高的，只有委托漏洞紧随其后；(2) 在多类分类中，TF-IDF 紧随 BoW 之后（0.8% 以内）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study

Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information

自引率

0.00%

发文量