Semi-automated Software Requirements Categorisation using Machine Learning Algorithms

IF 0.9 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

International Journal of Electrical and Computer Engineering Systems Pub Date : 2023-12-12 DOI:10.32985/ijeces.14.10.3

Pratvina Talele, Siddharth Apte, R. Phalnikar, Harsha V. Talele

{"title":"Semi-automated Software Requirements Categorisation using Machine Learning Algorithms","authors":"Pratvina Talele, Siddharth Apte, R. Phalnikar, Harsha V. Talele","doi":"10.32985/ijeces.14.10.3","DOIUrl":null,"url":null,"abstract":"Requirement engineering is a mandatory phase of the Software development life cycle (SDLC) that includes defining and documenting system requirements in the Software Requirements Specification (SRS). As the complexity increases, it becomes difficult to categorise the requirements into functional and non-functional requirements. Presently, the dearth of automated techniques necessitates reliance on labour-intensive and time-consuming manual methods for this purpose. This research endeavours to address this gap by investigating and contrasting two prominent feature extraction techniques and their efficacy in automating the classification of requirements. Natural language processing methods are used in the text pre-processing phase, followed by the Term Frequency – Inverse Document Frequency (TF-IDF) and Word2Vec for feature extraction for further understanding. These features are used as input to the Machine Learning algorithms. This study compares existing machine learning algorithms and discusses their correctness in categorising the software requirements. In our study, we have assessed the algorithms Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Neural Network (NN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) on the precision and accuracy parameters. The results obtained in this study showed that the TF-IDF feature selection algorithm performed better in categorising requirements than the Word2Vec algorithm, with an accuracy of 91.20% for the Support Vector Machine (SVM) and Random Forest algorithm as compared to 87.36% for the SVM algorithm. A 3.84% difference is seen between the two when applied to the publicly available PURE dataset. We believe these results will aid developers in building products that aid in requirement engineering.","PeriodicalId":41912,"journal":{"name":"International Journal of Electrical and Computer Engineering Systems","volume":"27 13","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical and Computer Engineering Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32985/ijeces.14.10.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Requirement engineering is a mandatory phase of the Software development life cycle (SDLC) that includes defining and documenting system requirements in the Software Requirements Specification (SRS). As the complexity increases, it becomes difficult to categorise the requirements into functional and non-functional requirements. Presently, the dearth of automated techniques necessitates reliance on labour-intensive and time-consuming manual methods for this purpose. This research endeavours to address this gap by investigating and contrasting two prominent feature extraction techniques and their efficacy in automating the classification of requirements. Natural language processing methods are used in the text pre-processing phase, followed by the Term Frequency – Inverse Document Frequency (TF-IDF) and Word2Vec for feature extraction for further understanding. These features are used as input to the Machine Learning algorithms. This study compares existing machine learning algorithms and discusses their correctness in categorising the software requirements. In our study, we have assessed the algorithms Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Neural Network (NN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) on the precision and accuracy parameters. The results obtained in this study showed that the TF-IDF feature selection algorithm performed better in categorising requirements than the Word2Vec algorithm, with an accuracy of 91.20% for the Support Vector Machine (SVM) and Random Forest algorithm as compared to 87.36% for the SVM algorithm. A 3.84% difference is seen between the two when applied to the publicly available PURE dataset. We believe these results will aid developers in building products that aid in requirement engineering.

查看原文本刊更多论文

利用机器学习算法进行半自动软件需求分类

需求工程是软件开发生命周期（SDLC）的一个必经阶段，包括在软件需求规格（SRS）中定义和记录系统需求。随着复杂性的增加，将需求分为功能性需求和非功能性需求变得越来越困难。目前，由于缺乏自动化技术，必须依赖劳动密集型和耗时的手工方法来实现这一目的。本研究通过研究和对比两种著名的特征提取技术及其在自动分类需求方面的功效，努力弥补这一不足。在文本预处理阶段使用自然语言处理方法，然后使用术语频率-反向文档频率（TF-IDF）和 Word2Vec 进行特征提取，以便进一步理解。这些特征被用作机器学习算法的输入。本研究比较了现有的机器学习算法，并讨论了它们在软件需求分类方面的正确性。在研究中，我们对决策树 (DT)、随机森林 (RF)、逻辑回归 (LR)、神经网络 (NN)、K-最近邻 (KNN) 和支持向量机 (SVM) 等算法的精确度和准确度参数进行了评估。研究结果表明，TF-IDF 特征选择算法在需求分类方面的表现优于 Word2Vec 算法，支持向量机（SVM）和随机森林算法的准确率为 91.20%，而 SVM 算法的准确率为 87.36%。当应用于公开的 PURE 数据集时，两者之间的差异为 3.84%。我们相信，这些结果将有助于开发人员构建有助于需求工程的产品。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Electrical and Computer Engineering Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

1.20

自引率

11.80%

发文量

期刊介绍： The International Journal of Electrical and Computer Engineering Systems publishes original research in the form of full papers, case studies, reviews and surveys. It covers theory and application of electrical and computer engineering, synergy of computer systems and computational methods with electrical and electronic systems, as well as interdisciplinary research. Power systems Renewable electricity production Power electronics Electrical drives Industrial electronics Communication systems Advanced modulation techniques RFID devices and systems Signal and data processing Image processing Multimedia systems Microelectronics Instrumentation and measurement Control systems Robotics Modeling and simulation Modern computer architectures Computer networks Embedded systems High-performance computing Engineering education Parallel and distributed computer systems Human-computer systems Intelligent systems Multi-agent and holonic systems Real-time systems Software engineering Internet and web applications and systems Applications of computer systems in engineering and related disciplines Mathematical models of engineering systems Engineering management.