Generating domain models from natural language text using NLP: a benchmark dataset and experimental comparison of tools

IF 3.2 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software and Systems Modeling Pub Date : 2024-05-08 DOI:10.1007/s10270-024-01176-y

Fatma Bozyigit, Tolgahan Bardakci, Alireza Khalilipour, Moharram Challenger, Guus Ramackers, Önder Babur, Michel R. V. Chaudron

{"title":"Generating domain models from natural language text using NLP: a benchmark dataset and experimental comparison of tools","authors":"Fatma Bozyigit, Tolgahan Bardakci, Alireza Khalilipour, Moharram Challenger, Guus Ramackers, Önder Babur, Michel R. V. Chaudron","doi":"10.1007/s10270-024-01176-y","DOIUrl":null,"url":null,"abstract":"<p>Software requirements specification describes users’ needs and expectations on some target system. Requirements documents are typically represented by unstructured natural language text. Such texts are the basis for the various subsequent activities in software development, such as software analysis and design. As part of software analysis, domain models are made that describe the key concepts and relations between them. Since the analysis process is performed manually by business analysts, it is time-consuming and may introduce mistakes. Recently, researchers have worked toward automating the synthesis of domain models from textual software requirements. Current studies on this topic have limitations in terms of the volume and heterogeneity of experimental datasets. To remedy this, we provide a curated dataset of software requirements to be utilized as a benchmark by algorithms that transform textual requirements documents into domain models. We present a detailed evaluation of two text-to-model approaches: one based on a large-language model (ChatGPT) and one building on grammatical rules (txt2Model). Our evaluation reveals that both tools yield promising results with relatively high F-scores for modeling the classes, attributes, methods, and relationships, with txt2Model performing better than ChatGPT on average. Both tools have relatively lower performance and high variance when it comes to the relation types. We believe our dataset and experimental evaluation pave to way to advance the field of automated model generation from requirements.</p>","PeriodicalId":49507,"journal":{"name":"Software and Systems Modeling","volume":"254 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software and Systems Modeling","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10270-024-01176-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software requirements specification describes users’ needs and expectations on some target system. Requirements documents are typically represented by unstructured natural language text. Such texts are the basis for the various subsequent activities in software development, such as software analysis and design. As part of software analysis, domain models are made that describe the key concepts and relations between them. Since the analysis process is performed manually by business analysts, it is time-consuming and may introduce mistakes. Recently, researchers have worked toward automating the synthesis of domain models from textual software requirements. Current studies on this topic have limitations in terms of the volume and heterogeneity of experimental datasets. To remedy this, we provide a curated dataset of software requirements to be utilized as a benchmark by algorithms that transform textual requirements documents into domain models. We present a detailed evaluation of two text-to-model approaches: one based on a large-language model (ChatGPT) and one building on grammatical rules (txt2Model). Our evaluation reveals that both tools yield promising results with relatively high F-scores for modeling the classes, attributes, methods, and relationships, with txt2Model performing better than ChatGPT on average. Both tools have relatively lower performance and high variance when it comes to the relation types. We believe our dataset and experimental evaluation pave to way to advance the field of automated model generation from requirements.

Abstract Image

查看原文本刊更多论文

利用 NLP 从自然语言文本生成领域模型：基准数据集和工具实验比较

软件需求规格描述了用户对某些目标系统的需求和期望。需求文档通常由非结构化的自然语言文本表示。这些文本是软件开发中各种后续活动（如软件分析和设计）的基础。作为软件分析的一部分，需要建立描述关键概念及其相互关系的领域模型。由于分析过程是由业务分析人员手工完成的，因此既耗时又可能出错。最近，研究人员致力于从文本软件需求中自动合成领域模型。目前有关这一主题的研究在实验数据集的数量和异质性方面存在局限性。为了弥补这一不足，我们提供了一个经过整理的软件需求数据集，作为将文本需求文档转化为领域模型的算法的基准。我们详细评估了两种文本到模型的方法：一种基于大型语言模型（ChatGPT），另一种基于语法规则（txt2Model）。我们的评估结果表明，这两种工具在类、属性、方法和关系建模方面都取得了可喜的成果，F 分数相对较高，其中 txt2Model 的平均表现优于 ChatGPT。在关系类型方面，两种工具的性能都相对较低，且差异较大。我们相信，我们的数据集和实验评估为推动根据需求自动生成模型领域的发展铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Software and Systems Modeling 工程技术-计算机：软件工程

CiteScore

6.00

自引率

20.00%

发文量

104

审稿时长

>12 weeks

期刊介绍： We invite authors to submit papers that discuss and analyze research challenges and experiences pertaining to software and system modeling languages, techniques, tools, practices and other facets. The following are some of the topic areas that are of special interest, but the journal publishes on a wide range of software and systems modeling concerns: Domain-specific models and modeling standards; Model-based testing techniques; Model-based simulation techniques; Formal syntax and semantics of modeling languages such as the UML; Rigorous model-based analysis; Model composition, refinement and transformation; Software Language Engineering; Modeling Languages in Science and Engineering; Language Adaptation and Composition; Metamodeling techniques; Measuring quality of models and languages; Ontological approaches to model engineering; Generating test and code artifacts from models; Model synthesis; Methodology; Model development tool environments; Modeling Cyberphysical Systems; Data intensive modeling; Derivation of explicit models from data; Case studies and experience reports with significant modeling lessons learned; Comparative analyses of modeling languages and techniques; Scientific assessment of modeling practices