旧代码的新技巧:AI聊天机器人能取代静态代码分析工具吗?

Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference Pub Date : 2023-06-14 DOI:10.1145/3590777.3590780

Omer Said Ozturk, E. Ekmekcioglu, Orçun Çetin, B. Arief, J. Hernandez-Castro

{"title":"旧代码的新技巧:AI聊天机器人能取代静态代码分析工具吗?","authors":"Omer Said Ozturk, E. Ekmekcioglu, Orçun Çetin, B. Arief, J. Hernandez-Castro","doi":"10.1145/3590777.3590780","DOIUrl":null,"url":null,"abstract":"The prevalence and significance of web services in our daily lives make it imperative to ensure that they are – as much as possible – free from vulnerabilities. However, developing a complex piece of software free from any security vulnerabilities is hard, if not impossible. One way to progress towards achieving this holy grail is by using static code analysis tools to root out any common or known vulnerabilities that may accidentally be introduced during the development process. Static code analysis tools have significantly contributed to addressing the problem above, but are imperfect. It is conceivable that static code analysis can be improved by using AI-powered tools, which have recently increased in popularity. However, there is still very little work in analysing both types of tools’ effectiveness, and this is a research gap that our paper aims to fill. We carried out a study involving 11 static code analysers, and one AI-powered chatbot named ChatGPT, to assess their effectiveness in detecting 92 vulnerabilities representing the top 10 known vulnerability categories in web applications, as classified by OWASP. We particularly focused on PHP vulnerabilities since it is one of the most widely used languages in web applications. However, it has few security mechanisms to help its software developers. We found that the success rate of ChatGPT in terms of finding security vulnerabilities in PHP is around 62-68%. At the same time, the best traditional static code analyser tested has a success rate of 32%. Even combining several traditional static code analysers (with the best features on certain aspects of detection) would only achieve a rate of 53%, which is still significantly lower than ChatGPT’s success rate. Nonetheless, ChatGPT has a very high false positive rate of 91%. In comparison, the worst false positive rate of any traditional static code analyser is 82%. These findings highlight the promising potential of ChatGPT for improving the static code analysis process but reveal certain caveats (especially regarding accuracy) in its current state. Our findings suggest that one interesting possibility to explore in future works would be to pick the best of both worlds by combining traditional static code analysers with ChatGPT to find security vulnerabilities more effectively.","PeriodicalId":231403,"journal":{"name":"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"New Tricks to Old Codes: Can AI Chatbots Replace Static Code Analysis Tools?\",\"authors\":\"Omer Said Ozturk, E. Ekmekcioglu, Orçun Çetin, B. Arief, J. Hernandez-Castro\",\"doi\":\"10.1145/3590777.3590780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prevalence and significance of web services in our daily lives make it imperative to ensure that they are – as much as possible – free from vulnerabilities. However, developing a complex piece of software free from any security vulnerabilities is hard, if not impossible. One way to progress towards achieving this holy grail is by using static code analysis tools to root out any common or known vulnerabilities that may accidentally be introduced during the development process. Static code analysis tools have significantly contributed to addressing the problem above, but are imperfect. It is conceivable that static code analysis can be improved by using AI-powered tools, which have recently increased in popularity. However, there is still very little work in analysing both types of tools’ effectiveness, and this is a research gap that our paper aims to fill. We carried out a study involving 11 static code analysers, and one AI-powered chatbot named ChatGPT, to assess their effectiveness in detecting 92 vulnerabilities representing the top 10 known vulnerability categories in web applications, as classified by OWASP. We particularly focused on PHP vulnerabilities since it is one of the most widely used languages in web applications. However, it has few security mechanisms to help its software developers. We found that the success rate of ChatGPT in terms of finding security vulnerabilities in PHP is around 62-68%. At the same time, the best traditional static code analyser tested has a success rate of 32%. Even combining several traditional static code analysers (with the best features on certain aspects of detection) would only achieve a rate of 53%, which is still significantly lower than ChatGPT’s success rate. Nonetheless, ChatGPT has a very high false positive rate of 91%. In comparison, the worst false positive rate of any traditional static code analyser is 82%. These findings highlight the promising potential of ChatGPT for improving the static code analysis process but reveal certain caveats (especially regarding accuracy) in its current state. Our findings suggest that one interesting possibility to explore in future works would be to pick the best of both worlds by combining traditional static code analysers with ChatGPT to find security vulnerabilities more effectively.\",\"PeriodicalId\":231403,\"journal\":{\"name\":\"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3590777.3590780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590777.3590780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

web服务在我们日常生活中的普及和重要性使得确保它们尽可能不受漏洞的影响成为当务之急。然而，开发一个没有任何安全漏洞的复杂软件是困难的，如果不是不可能的话。实现这一目标的一种方法是使用静态代码分析工具来根除在开发过程中可能偶然引入的任何常见或已知的漏洞。静态代码分析工具对解决上述问题做出了重大贡献，但并不完美。可以想象，静态代码分析可以通过使用人工智能驱动的工具来改进，这些工具最近越来越受欢迎。然而，在分析这两种工具的有效性方面仍然很少有工作，这是我们的论文旨在填补的研究空白。我们进行了一项涉及11个静态代码分析器和一个名为ChatGPT的人工智能聊天机器人的研究，以评估它们在检测92个漏洞方面的有效性，这些漏洞代表了OWASP分类的web应用程序中十大已知漏洞类别。我们特别关注PHP漏洞，因为它是web应用程序中使用最广泛的语言之一。然而，它几乎没有安全机制来帮助其软件开发人员。我们发现ChatGPT在PHP中发现安全漏洞的成功率在62-68%左右。同时，经过测试的最佳传统静态代码分析器的成功率为32%。即使结合几个传统的静态代码分析器(在某些检测方面具有最佳特性)也只能达到53%的成功率，这仍然明显低于ChatGPT的成功率。尽管如此，ChatGPT的假阳性率高达91%。相比之下，任何传统静态代码分析器的最坏误报率为82%。这些发现突出了ChatGPT在改进静态代码分析过程方面的潜力，但也揭示了其当前状态下的某些警告(特别是关于准确性)。我们的发现表明，在未来的工作中，一个值得探索的有趣的可能性是，通过将传统的静态代码分析器与ChatGPT结合起来，选择两者的最佳选择，从而更有效地发现安全漏洞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

New Tricks to Old Codes: Can AI Chatbots Replace Static Code Analysis Tools?

The prevalence and significance of web services in our daily lives make it imperative to ensure that they are – as much as possible – free from vulnerabilities. However, developing a complex piece of software free from any security vulnerabilities is hard, if not impossible. One way to progress towards achieving this holy grail is by using static code analysis tools to root out any common or known vulnerabilities that may accidentally be introduced during the development process. Static code analysis tools have significantly contributed to addressing the problem above, but are imperfect. It is conceivable that static code analysis can be improved by using AI-powered tools, which have recently increased in popularity. However, there is still very little work in analysing both types of tools’ effectiveness, and this is a research gap that our paper aims to fill. We carried out a study involving 11 static code analysers, and one AI-powered chatbot named ChatGPT, to assess their effectiveness in detecting 92 vulnerabilities representing the top 10 known vulnerability categories in web applications, as classified by OWASP. We particularly focused on PHP vulnerabilities since it is one of the most widely used languages in web applications. However, it has few security mechanisms to help its software developers. We found that the success rate of ChatGPT in terms of finding security vulnerabilities in PHP is around 62-68%. At the same time, the best traditional static code analyser tested has a success rate of 32%. Even combining several traditional static code analysers (with the best features on certain aspects of detection) would only achieve a rate of 53%, which is still significantly lower than ChatGPT’s success rate. Nonetheless, ChatGPT has a very high false positive rate of 91%. In comparison, the worst false positive rate of any traditional static code analyser is 82%. These findings highlight the promising potential of ChatGPT for improving the static code analysis process but reveal certain caveats (especially regarding accuracy) in its current state. Our findings suggest that one interesting possibility to explore in future works would be to pick the best of both worlds by combining traditional static code analysers with ChatGPT to find security vulnerabilities more effectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference

自引率

0.00%

发文量