大型语言模型的配置验证

arXiv - CS - Operating Systems Pub Date : 2023-10-15 DOI:arxiv-2310.09690

Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, Tianyin Xu

{"title":"大型语言模型的配置验证","authors":"Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, Tianyin Xu","doi":"arxiv-2310.09690","DOIUrl":null,"url":null,"abstract":"Misconfigurations are the major causes of software failures. Existing\nconfiguration validation techniques rely on manually written rules or test\ncases, which are expensive to implement and maintain, and are hard to be\ncomprehensive. Leveraging machine learning (ML) and natural language processing\n(NLP) for configuration validation is considered a promising direction, but has\nbeen facing challenges such as the need of not only large-scale configuration\ndata, but also system-specific features and models which are hard to\ngeneralize. Recent advances in Large Language Models (LLMs) show the promises\nto address some of the long-lasting limitations of ML/NLP-based configuration\nvalidation techniques. In this paper, we present an exploratory analysis on the\nfeasibility and effectiveness of using LLMs like GPT and Codex for\nconfiguration validation. Specifically, we take a first step to empirically\nevaluate LLMs as configuration validators without additional fine-tuning or\ncode generation. We develop a generic LLM-based validation framework, named\nCiri, which integrates different LLMs. Ciri devises effective prompt\nengineering with few-shot learning based on both valid configuration and\nmisconfiguration data. Ciri also validates and aggregates the outputs of LLMs\nto generate validation results, coping with known hallucination and\nnondeterminism of LLMs. We evaluate the validation effectiveness of Ciri on\nfive popular LLMs using configuration data of six mature, widely deployed\nopen-source systems. Our analysis (1) confirms the potential of using LLMs for\nconfiguration validation, (2) understands the design space of LLMbased\nvalidators like Ciri, especially in terms of prompt engineering with few-shot\nlearning, and (3) reveals open challenges such as ineffectiveness in detecting\ncertain types of misconfigurations and biases to popular configuration\nparameters.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"56 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Configuration Validation with Large Language Models\",\"authors\":\"Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, Tianyin Xu\",\"doi\":\"arxiv-2310.09690\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Misconfigurations are the major causes of software failures. Existing\\nconfiguration validation techniques rely on manually written rules or test\\ncases, which are expensive to implement and maintain, and are hard to be\\ncomprehensive. Leveraging machine learning (ML) and natural language processing\\n(NLP) for configuration validation is considered a promising direction, but has\\nbeen facing challenges such as the need of not only large-scale configuration\\ndata, but also system-specific features and models which are hard to\\ngeneralize. Recent advances in Large Language Models (LLMs) show the promises\\nto address some of the long-lasting limitations of ML/NLP-based configuration\\nvalidation techniques. In this paper, we present an exploratory analysis on the\\nfeasibility and effectiveness of using LLMs like GPT and Codex for\\nconfiguration validation. Specifically, we take a first step to empirically\\nevaluate LLMs as configuration validators without additional fine-tuning or\\ncode generation. We develop a generic LLM-based validation framework, named\\nCiri, which integrates different LLMs. Ciri devises effective prompt\\nengineering with few-shot learning based on both valid configuration and\\nmisconfiguration data. Ciri also validates and aggregates the outputs of LLMs\\nto generate validation results, coping with known hallucination and\\nnondeterminism of LLMs. We evaluate the validation effectiveness of Ciri on\\nfive popular LLMs using configuration data of six mature, widely deployed\\nopen-source systems. Our analysis (1) confirms the potential of using LLMs for\\nconfiguration validation, (2) understands the design space of LLMbased\\nvalidators like Ciri, especially in terms of prompt engineering with few-shot\\nlearning, and (3) reveals open challenges such as ineffectiveness in detecting\\ncertain types of misconfigurations and biases to popular configuration\\nparameters.\",\"PeriodicalId\":501333,\"journal\":{\"name\":\"arXiv - CS - Operating Systems\",\"volume\":\"56 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Operating Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2310.09690\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2310.09690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

错误配置是导致软件故障的主要原因。现有的配置验证技术依赖于手工编写的规则或测试用例，这在实现和维护上是昂贵的，并且很难理解。利用机器学习(ML)和自然语言处理(NLP)进行配置验证被认为是一个有前途的方向，但一直面临着挑战，例如不仅需要大规模配置数据，而且还需要难以概括的系统特定功能和模型。大型语言模型(llm)的最新进展表明，它有望解决基于ML/ nlp的配置验证技术的一些长期限制。在本文中，我们对使用像GPT和Codex这样的llm进行配置验证的可行性和有效性进行了探索性分析。具体地说，我们采取了第一步，在没有额外的微调或代码生成的情况下，将llm作为配置验证器进行经验评估。我们开发了一个通用的基于llm的验证框架，名为ciri，它集成了不同的llm。Ciri设计了有效的提示工程，基于有效配置和错误配置数据的少量学习。Ciri还对llm的输出进行验证和聚合以生成验证结果，以应对已知的llm的幻觉和不确定性。我们使用六个成熟的、广泛部署的开源系统的配置数据，评估了Ciri在五个流行的llm上的验证有效性。我们的分析(1)证实了使用llm进行配置验证的潜力，(2)了解了基于llm的验证器(如Ciri)的设计空间，特别是在使用少量学习的快速工程方面，以及(3)揭示了开放的挑战，例如检测某些类型的错误配置的有效性和对流行配置参数的偏差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Configuration Validation with Large Language Models

Misconfigurations are the major causes of software failures. Existing configuration validation techniques rely on manually written rules or test cases, which are expensive to implement and maintain, and are hard to be comprehensive. Leveraging machine learning (ML) and natural language processing (NLP) for configuration validation is considered a promising direction, but has been facing challenges such as the need of not only large-scale configuration data, but also system-specific features and models which are hard to generalize. Recent advances in Large Language Models (LLMs) show the promises to address some of the long-lasting limitations of ML/NLP-based configuration validation techniques. In this paper, we present an exploratory analysis on the feasibility and effectiveness of using LLMs like GPT and Codex for configuration validation. Specifically, we take a first step to empirically evaluate LLMs as configuration validators without additional fine-tuning or code generation. We develop a generic LLM-based validation framework, named Ciri, which integrates different LLMs. Ciri devises effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data. Ciri also validates and aggregates the outputs of LLMs to generate validation results, coping with known hallucination and nondeterminism of LLMs. We evaluate the validation effectiveness of Ciri on five popular LLMs using configuration data of six mature, widely deployed open-source systems. Our analysis (1) confirms the potential of using LLMs for configuration validation, (2) understands the design space of LLMbased validators like Ciri, especially in terms of prompt engineering with few-shot learning, and (3) reveals open challenges such as ineffectiveness in detecting certain types of misconfigurations and biases to popular configuration parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Operating Systems

自引率

0.00%

发文量