J. Galindo, Antonio J. Dominguez, Jules White, David Benavides
{"title":"大型语言模型生成有意义的特征模型实例","authors":"J. Galindo, Antonio J. Dominguez, Jules White, David Benavides","doi":"10.1145/3579027.3608973","DOIUrl":null,"url":null,"abstract":"Feature models are the \"de facto\" standard for representing variability in software-intensive systems. Automated analysis of feature models is the computer-aided extraction of information of feature models and is used in testing, maintenance, configuration, and derivation, among other tasks. Testing the analyses of feature models often requires relying on a large number of models that are as realistic as possible. There exist different proposals to generate synthetic feature models using random techniques or metamorphic relations; however, the existing methods do not take into account the semantics of the concepts of the domain that are being represented and the interrelations between them, leading to less realistic feature models. In this paper, we propose a novel approach that uses Large Language Models (LLMs), such as Codex or GPT-3, to generate realistic feature models that preserve semantic coherence while maintaining syntactic validity. The approach automatically generates instances of feature models from a given domain. Concretely, two language models were used, first OpenAI's Codex to generate new instances of feature models using the Universal Variability Language (UVL) syntax and then Cohere's semantic analysis to verify if the newly introduced concepts are from the same domain. This approach enabled the generation of 90% of valid instances according to the UVL syntax. In addition, the valid models score well on model complexity metrics, and the generated features mirror the domain of the original UVL instance used as prompts. With this work, we envision a new thread of research where variability is generated and analyzed using LLMs. This opens the door for a new generation of techniques and tools for variability management.","PeriodicalId":322542,"journal":{"name":"Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Large Language Models to generate meaningful feature model instances\",\"authors\":\"J. Galindo, Antonio J. Dominguez, Jules White, David Benavides\",\"doi\":\"10.1145/3579027.3608973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature models are the \\\"de facto\\\" standard for representing variability in software-intensive systems. Automated analysis of feature models is the computer-aided extraction of information of feature models and is used in testing, maintenance, configuration, and derivation, among other tasks. Testing the analyses of feature models often requires relying on a large number of models that are as realistic as possible. There exist different proposals to generate synthetic feature models using random techniques or metamorphic relations; however, the existing methods do not take into account the semantics of the concepts of the domain that are being represented and the interrelations between them, leading to less realistic feature models. In this paper, we propose a novel approach that uses Large Language Models (LLMs), such as Codex or GPT-3, to generate realistic feature models that preserve semantic coherence while maintaining syntactic validity. The approach automatically generates instances of feature models from a given domain. Concretely, two language models were used, first OpenAI's Codex to generate new instances of feature models using the Universal Variability Language (UVL) syntax and then Cohere's semantic analysis to verify if the newly introduced concepts are from the same domain. This approach enabled the generation of 90% of valid instances according to the UVL syntax. In addition, the valid models score well on model complexity metrics, and the generated features mirror the domain of the original UVL instance used as prompts. With this work, we envision a new thread of research where variability is generated and analyzed using LLMs. This opens the door for a new generation of techniques and tools for variability management.\",\"PeriodicalId\":322542,\"journal\":{\"name\":\"Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579027.3608973\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579027.3608973","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large Language Models to generate meaningful feature model instances
Feature models are the "de facto" standard for representing variability in software-intensive systems. Automated analysis of feature models is the computer-aided extraction of information of feature models and is used in testing, maintenance, configuration, and derivation, among other tasks. Testing the analyses of feature models often requires relying on a large number of models that are as realistic as possible. There exist different proposals to generate synthetic feature models using random techniques or metamorphic relations; however, the existing methods do not take into account the semantics of the concepts of the domain that are being represented and the interrelations between them, leading to less realistic feature models. In this paper, we propose a novel approach that uses Large Language Models (LLMs), such as Codex or GPT-3, to generate realistic feature models that preserve semantic coherence while maintaining syntactic validity. The approach automatically generates instances of feature models from a given domain. Concretely, two language models were used, first OpenAI's Codex to generate new instances of feature models using the Universal Variability Language (UVL) syntax and then Cohere's semantic analysis to verify if the newly introduced concepts are from the same domain. This approach enabled the generation of 90% of valid instances according to the UVL syntax. In addition, the valid models score well on model complexity metrics, and the generated features mirror the domain of the original UVL instance used as prompts. With this work, we envision a new thread of research where variability is generated and analyzed using LLMs. This opens the door for a new generation of techniques and tools for variability management.