{"title":"为使用大型语言模型的智能建模助手利用建模操作的合成跟踪生成","authors":"Vittoriano Muttillo , Claudio Di Sipio , Riccardo Rubei , Luca Berardinelli","doi":"10.1016/j.infsof.2025.107806","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Due to the proliferation of generative AI models in different software engineering tasks, the research community has started to exploit those models, spanning from requirement specification to code development. Model-Driven Engineering (MDE) is a paradigm that leverages software models as primary artifacts to automate tasks. In this respect, modelers have started to investigate the interplay between traditional MDE practices and Large Language Models (LLMs) to push automation. Although powerful, LLMs exhibit limitations that undermine the quality of generated modeling artifacts, e.g., hallucination or incorrect formatting. Recording modeling operations relies on human-based activities to train modeling assistants, helping modelers in their daily tasks. Nevertheless, those techniques require a huge amount of training data that cannot be available due to several factors, e.g., security or privacy issues.</div></div><div><h3>Objective:</h3><div>In this paper, we propose an extension of a conceptual MDE framework, called MASTER-LLM, that combines different MDE tools and paradigms to support industrial and academic practitioners.</div></div><div><h3>Method:</h3><div>MASTER-LLM comprises a modeling environment that acts as the active context in which a dedicated component records modeling operations. Then, model completion is enabled by the modeling assistant trained on past operations. Different LLMs are used to generate a new dataset of modeling events to speed up recording and data collection.</div></div><div><h3>Results:</h3><div>To evaluate the feasibility of MASTER-LLM in practice, we experiment with two modeling environments, i.e., CAEX and HEPSYCODE, employed in industrial use cases within European projects. We investigate how the examined LLMs can generate realistic modeling operations in different domains.</div></div><div><h3>Conclusion:</h3><div>We show that synthetic traces can be effectively used when the application domain is less complex, while complex scenarios require human-based operations or a mixed approach according to data availability. However, generative AI models must be assessed using proper methodologies to avoid security issues in industrial domains.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107806"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging synthetic trace generation of modeling operations for intelligent modeling assistants using large language models\",\"authors\":\"Vittoriano Muttillo , Claudio Di Sipio , Riccardo Rubei , Luca Berardinelli\",\"doi\":\"10.1016/j.infsof.2025.107806\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Due to the proliferation of generative AI models in different software engineering tasks, the research community has started to exploit those models, spanning from requirement specification to code development. Model-Driven Engineering (MDE) is a paradigm that leverages software models as primary artifacts to automate tasks. In this respect, modelers have started to investigate the interplay between traditional MDE practices and Large Language Models (LLMs) to push automation. Although powerful, LLMs exhibit limitations that undermine the quality of generated modeling artifacts, e.g., hallucination or incorrect formatting. Recording modeling operations relies on human-based activities to train modeling assistants, helping modelers in their daily tasks. Nevertheless, those techniques require a huge amount of training data that cannot be available due to several factors, e.g., security or privacy issues.</div></div><div><h3>Objective:</h3><div>In this paper, we propose an extension of a conceptual MDE framework, called MASTER-LLM, that combines different MDE tools and paradigms to support industrial and academic practitioners.</div></div><div><h3>Method:</h3><div>MASTER-LLM comprises a modeling environment that acts as the active context in which a dedicated component records modeling operations. Then, model completion is enabled by the modeling assistant trained on past operations. Different LLMs are used to generate a new dataset of modeling events to speed up recording and data collection.</div></div><div><h3>Results:</h3><div>To evaluate the feasibility of MASTER-LLM in practice, we experiment with two modeling environments, i.e., CAEX and HEPSYCODE, employed in industrial use cases within European projects. We investigate how the examined LLMs can generate realistic modeling operations in different domains.</div></div><div><h3>Conclusion:</h3><div>We show that synthetic traces can be effectively used when the application domain is less complex, while complex scenarios require human-based operations or a mixed approach according to data availability. However, generative AI models must be assessed using proper methodologies to avoid security issues in industrial domains.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"186 \",\"pages\":\"Article 107806\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001454\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001454","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Leveraging synthetic trace generation of modeling operations for intelligent modeling assistants using large language models
Context:
Due to the proliferation of generative AI models in different software engineering tasks, the research community has started to exploit those models, spanning from requirement specification to code development. Model-Driven Engineering (MDE) is a paradigm that leverages software models as primary artifacts to automate tasks. In this respect, modelers have started to investigate the interplay between traditional MDE practices and Large Language Models (LLMs) to push automation. Although powerful, LLMs exhibit limitations that undermine the quality of generated modeling artifacts, e.g., hallucination or incorrect formatting. Recording modeling operations relies on human-based activities to train modeling assistants, helping modelers in their daily tasks. Nevertheless, those techniques require a huge amount of training data that cannot be available due to several factors, e.g., security or privacy issues.
Objective:
In this paper, we propose an extension of a conceptual MDE framework, called MASTER-LLM, that combines different MDE tools and paradigms to support industrial and academic practitioners.
Method:
MASTER-LLM comprises a modeling environment that acts as the active context in which a dedicated component records modeling operations. Then, model completion is enabled by the modeling assistant trained on past operations. Different LLMs are used to generate a new dataset of modeling events to speed up recording and data collection.
Results:
To evaluate the feasibility of MASTER-LLM in practice, we experiment with two modeling environments, i.e., CAEX and HEPSYCODE, employed in industrial use cases within European projects. We investigate how the examined LLMs can generate realistic modeling operations in different domains.
Conclusion:
We show that synthetic traces can be effectively used when the application domain is less complex, while complex scenarios require human-based operations or a mixed approach according to data availability. However, generative AI models must be assessed using proper methodologies to avoid security issues in industrial domains.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.