An Empirical Evaluation of Machine Learning Algorithms for Identifying Software Requirements on Stack Overflow: Initial Results

2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS) Pub Date : 2019-10-01 DOI:10.1109/ICSESS47205.2019.9040720

Arshad Ahmad, Chong Feng, Adnan Tahir, Asif Khan, M. Waqas, Sadique Ahmad, A. Ullah

{"title":"An Empirical Evaluation of Machine Learning Algorithms for Identifying Software Requirements on Stack Overflow: Initial Results","authors":"Arshad Ahmad, Chong Feng, Adnan Tahir, Asif Khan, M. Waqas, Sadique Ahmad, A. Ullah","doi":"10.1109/ICSESS47205.2019.9040720","DOIUrl":null,"url":null,"abstract":"Context: The recent developments made during the last decade or two in requirements engineering (RE) methods have seen a rise in using different machine-learning (ML) algorithms to solve some complex RE problems. One such problem is identifying and classifying software requirements on Stack Overflow (SO). The suitability of ML-based techniques to this tackle problem has shown convincing results, much better than those generated by some traditional natural language processing (NLP) techniques. Nevertheless, a comprehensive and systematic comprehension of these ML based techniques is still deficient. Objective: To identify and classify the type of ML algorithms used for identifying software requirements on SO. Method: This article reports systematic literature review (SLR) gathering evidence published up to August, 2019. Results: This study identified 1073 published papers related to RE and SO. Only 12 primary papers were selected. The data extraction process revealed that; 1) Latent Dirichlet Allocation (LDA) topic modeling is the most widely used ML algorithm in the selected studies, and 2) Precision and recall are the most commonly used evaluation method to measure the performance of these ML algorithms. Conclusion: The SLR finds that while ML algorithms have great potential in the identification of RE on SO, they face some open issues that will ultimately affect their performance and practical application. The SLR calls for the collaboration between RE and ML researchers, to tackle the open issues facing the development of real-world ML systems.","PeriodicalId":203944,"journal":{"name":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS47205.2019.9040720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Context: The recent developments made during the last decade or two in requirements engineering (RE) methods have seen a rise in using different machine-learning (ML) algorithms to solve some complex RE problems. One such problem is identifying and classifying software requirements on Stack Overflow (SO). The suitability of ML-based techniques to this tackle problem has shown convincing results, much better than those generated by some traditional natural language processing (NLP) techniques. Nevertheless, a comprehensive and systematic comprehension of these ML based techniques is still deficient. Objective: To identify and classify the type of ML algorithms used for identifying software requirements on SO. Method: This article reports systematic literature review (SLR) gathering evidence published up to August, 2019. Results: This study identified 1073 published papers related to RE and SO. Only 12 primary papers were selected. The data extraction process revealed that; 1) Latent Dirichlet Allocation (LDA) topic modeling is the most widely used ML algorithm in the selected studies, and 2) Precision and recall are the most commonly used evaluation method to measure the performance of these ML algorithms. Conclusion: The SLR finds that while ML algorithms have great potential in the identification of RE on SO, they face some open issues that will ultimately affect their performance and practical application. The SLR calls for the collaboration between RE and ML researchers, to tackle the open issues facing the development of real-world ML systems.

查看原文本刊更多论文

用于识别堆栈溢出软件需求的机器学习算法的经验评估:初步结果

背景:在过去的十年或二十年中，需求工程(RE)方法的最新发展已经看到了使用不同的机器学习(ML)算法来解决一些复杂的需求工程问题的兴起。其中一个问题是识别和分类堆栈溢出(SO)上的软件需求。基于机器学习的技术在这个问题上的适用性已经显示出令人信服的结果，比一些传统的自然语言处理(NLP)技术产生的结果要好得多。然而，对这些基于机器学习的技术的全面和系统的理解仍然不足。目的:识别和分类用于识别SO软件需求的ML算法类型。方法:本文报道截至2019年8月发表的系统性文献综述(SLR)收集证据。结果:本研究检索到与RE和SO相关的已发表论文1073篇。只有12篇主要论文入选。数据提取过程显示;1)潜狄利克雷分配(Latent Dirichlet Allocation, LDA)主题建模是所选研究中使用最广泛的机器学习算法，2)精度和召回率是衡量这些机器学习算法性能的最常用的评价方法。结论:SLR发现，虽然ML算法在识别SO上的RE方面具有很大的潜力，但它们面临一些悬而未决的问题，这些问题最终会影响它们的性能和实际应用。SLR呼吁RE和ML研究人员之间的合作，以解决现实世界ML系统开发面临的开放性问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)

自引率

0.00%

发文量