Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-08-29 DOI:10.1109/TETCI.2024.3446695

Gavin S. Black;Bhaskar P. Rimal;Varghese Mathew Vaidyan

{"title":"Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models","authors":"Gavin S. Black;Bhaskar P. Rimal;Varghese Mathew Vaidyan","doi":"10.1109/TETCI.2024.3446695","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"419-430"},"PeriodicalIF":5.3000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10658990/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.

查看原文本刊更多论文

代码生成中安全性与正确性的平衡：基于商业大型语言模型的实证研究

大型语言模型（llm）继续被用于大量以前的手工任务，其中代码生成是一个突出的用途。由于接口的可访问性，多种商业模型已经被广泛采用。简单的提示可以产生有效的解决方案，从而节省开发人员的时间。然而，生成的代码在维护安全性方面存在重大挑战。代码安全没有保证，LLM响应很容易包含已知的弱点。为了解决这个问题，我们的研究检查了不同的提示类型，用于从代码生成任务中形成响应，以产生更安全的输出。最上面的一组常见弱点是通过无条件提示生成的，以便跨多个商业llm创建易受攻击的代码。然后将这些输入与不同的上下文、角色和旨在提高安全性的识别提示配对。我们的研究结果表明，包含适当的指导可以减少生成代码中的漏洞，其中模型的选择具有最显著的效果。此外，还提供了时序来演示限制模型交互数量的单个请求的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.