Automating software size measurement with language models: Insights from industrial case studies

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-10-08 DOI:10.1016/j.jss.2025.112638

Hüseyin Ünlü, Samet Tenekeci, Dhia Eddine Kennouche, Onur Demirörs

{"title":"Automating software size measurement with language models: Insights from industrial case studies","authors":"Hüseyin Ünlü, Samet Tenekeci, Dhia Eddine Kennouche, Onur Demirörs","doi":"10.1016/j.jss.2025.112638","DOIUrl":null,"url":null,"abstract":"<div><div>Objective software size measurement is critical for accurate effort estimation, yet many organizations avoid it due to high costs, required expertise, and time-consuming manual effort. This often leads to vague predictions, poor planning, and project overruns. To address this challenge, we investigate the use of pre-trained language models — BERT and SE-BERT — to automate size measurement based on textual requirements using COSMIC and MicroM methods. We constructed one heterogeneous dataset and two industrial datasets, each manually measured by experienced analysts. Models were evaluated in three settings: (i) generic model evaluation, where the models are trained and tested on heterogeneous data, (ii) internal evaluation, where the models are trained and tested on organization-specific data, and (iii) external evaluation, where generic models were tested on organization-specific data. Results show that organization-specific models significantly outperform generic models, indicating that aligning training data with the target organization’s requirement style is critical for accuracy. SE-BERT, a domain-adapted variant of BERT, improves performance, particularly in low-resource settings. These findings highlight the practical potential of tailoring training data for broader adoption and cost-effective software size measurement in industrial contexts.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"231 ","pages":"Article 112638"},"PeriodicalIF":4.1000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225003073","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Objective software size measurement is critical for accurate effort estimation, yet many organizations avoid it due to high costs, required expertise, and time-consuming manual effort. This often leads to vague predictions, poor planning, and project overruns. To address this challenge, we investigate the use of pre-trained language models — BERT and SE-BERT — to automate size measurement based on textual requirements using COSMIC and MicroM methods. We constructed one heterogeneous dataset and two industrial datasets, each manually measured by experienced analysts. Models were evaluated in three settings: (i) generic model evaluation, where the models are trained and tested on heterogeneous data, (ii) internal evaluation, where the models are trained and tested on organization-specific data, and (iii) external evaluation, where generic models were tested on organization-specific data. Results show that organization-specific models significantly outperform generic models, indicating that aligning training data with the target organization’s requirement style is critical for accuracy. SE-BERT, a domain-adapted variant of BERT, improves performance, particularly in low-resource settings. These findings highlight the practical potential of tailoring training data for broader adoption and cost-effective software size measurement in industrial contexts.

Abstract Image

查看原文本刊更多论文

使用语言模型自动化软件规模度量：来自工业案例研究的见解

客观的软件大小度量对于准确的工作量估计是至关重要的，但是由于成本高、需要专业知识和耗时的手工工作，许多组织都避免使用它。这通常会导致模糊的预测、糟糕的计划和项目超支。为了解决这一挑战，我们研究了使用预训练的语言模型-BERT和SE-BERT -使用COSMIC和MicroM方法基于文本需求自动测量尺寸。我们构建了一个异构数据集和两个工业数据集，每个数据集都由经验丰富的分析师手动测量。在三种情况下对模型进行评估：(i)通用模型评估，其中模型在异质数据上进行训练和测试；（ii）内部评估，其中模型在组织特定数据上进行训练和测试；（iii）外部评估，其中通用模型在组织特定数据上进行测试。结果表明，特定于组织的模型显著优于一般模型，表明将训练数据与目标组织的需求风格相一致对于准确性至关重要。SE-BERT是BERT的领域适应变体，可以提高性能，特别是在低资源设置中。这些发现强调了在工业环境中裁剪培训数据的实际潜力，以便更广泛地采用和具有成本效益的软件规模测量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.