{"title":"Hybrid approach for multilevel multi-class requirement classification: Impact of stop-word removal and data augmentation","authors":"Jasleen Kaur , Chanchal Roy","doi":"10.1016/j.jss.2025.112594","DOIUrl":null,"url":null,"abstract":"<div><div>Requirement classification in software engineering is essential for effective development. Automating this process reduces human effort and enhances decision-making. Previous studies experimented with machine learning and deep learning models to classify requirements. This novel research fills that gap by evaluating transformer-based models and a proposed Hybrid Stacked Model for multilevel, multi-class classification task. To address the limitations of existing software requirement datasets (imbalanced dataset, insufficient granularity, real world examples), we combined instances from the PROMISE_exp dataset, PURE corpus, and 20 manually collected software requirement specifications (SRS) documents using a Boolean keyword search to create a multilevel, multi-class dataset. These 3072 combined requirements are organized into a two-level hierarchy: Level 1 (functional (FR)/non-functional (NFR)); Level 2 (FRs: core functional (CFR)/derived functional (DFR)/system integration (SI)/external dependency (ED); NFRs: product (PR)/organizational (OR)/external (ER)). We applied BERT-based context-aware text augmentation to address class imbalance by expanding the dataset to 3343 instances. This study also investigates the effects of domain-specific stopword removal and text augmentation on model performance. Results show that text augmentation boosts accuracy by 0.2–3.76% across all models. Stopword removal enhances precision and recall by reducing noise, but it slightly lowers overall accuracy due to the loss of some semantic cues. The proposed Hybrid Stacked Model outperformed all pre-trained transformer models, achieving the highest accuracy of 96.77% at Level 1 and 83.06% at Level 2. A statistical t-test confirms the significance of these improvements. These findings emphasize the importance of hybrid models and domain-specific data preprocessing in enhancing requirement classification, with practical implications for automating early-stage software engineering tasks.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"231 ","pages":"Article 112594"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002638","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Requirement classification in software engineering is essential for effective development. Automating this process reduces human effort and enhances decision-making. Previous studies experimented with machine learning and deep learning models to classify requirements. This novel research fills that gap by evaluating transformer-based models and a proposed Hybrid Stacked Model for multilevel, multi-class classification task. To address the limitations of existing software requirement datasets (imbalanced dataset, insufficient granularity, real world examples), we combined instances from the PROMISE_exp dataset, PURE corpus, and 20 manually collected software requirement specifications (SRS) documents using a Boolean keyword search to create a multilevel, multi-class dataset. These 3072 combined requirements are organized into a two-level hierarchy: Level 1 (functional (FR)/non-functional (NFR)); Level 2 (FRs: core functional (CFR)/derived functional (DFR)/system integration (SI)/external dependency (ED); NFRs: product (PR)/organizational (OR)/external (ER)). We applied BERT-based context-aware text augmentation to address class imbalance by expanding the dataset to 3343 instances. This study also investigates the effects of domain-specific stopword removal and text augmentation on model performance. Results show that text augmentation boosts accuracy by 0.2–3.76% across all models. Stopword removal enhances precision and recall by reducing noise, but it slightly lowers overall accuracy due to the loss of some semantic cues. The proposed Hybrid Stacked Model outperformed all pre-trained transformer models, achieving the highest accuracy of 96.77% at Level 1 and 83.06% at Level 2. A statistical t-test confirms the significance of these improvements. These findings emphasize the importance of hybrid models and domain-specific data preprocessing in enhancing requirement classification, with practical implications for automating early-stage software engineering tasks.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.