An Empirical Evaluation of Machine Learning Algorithms for Identifying Software Requirements on Stack Overflow: Initial Results

Arshad Ahmad, Chong Feng, Adnan Tahir, Asif Khan, M. Waqas, Sadique Ahmad, A. Ullah
{"title":"An Empirical Evaluation of Machine Learning Algorithms for Identifying Software Requirements on Stack Overflow: Initial Results","authors":"Arshad Ahmad, Chong Feng, Adnan Tahir, Asif Khan, M. Waqas, Sadique Ahmad, A. Ullah","doi":"10.1109/ICSESS47205.2019.9040720","DOIUrl":null,"url":null,"abstract":"Context: The recent developments made during the last decade or two in requirements engineering (RE) methods have seen a rise in using different machine-learning (ML) algorithms to solve some complex RE problems. One such problem is identifying and classifying software requirements on Stack Overflow (SO). The suitability of ML-based techniques to this tackle problem has shown convincing results, much better than those generated by some traditional natural language processing (NLP) techniques. Nevertheless, a comprehensive and systematic comprehension of these ML based techniques is still deficient. Objective: To identify and classify the type of ML algorithms used for identifying software requirements on SO. Method: This article reports systematic literature review (SLR) gathering evidence published up to August, 2019. Results: This study identified 1073 published papers related to RE and SO. Only 12 primary papers were selected. The data extraction process revealed that; 1) Latent Dirichlet Allocation (LDA) topic modeling is the most widely used ML algorithm in the selected studies, and 2) Precision and recall are the most commonly used evaluation method to measure the performance of these ML algorithms. Conclusion: The SLR finds that while ML algorithms have great potential in the identification of RE on SO, they face some open issues that will ultimately affect their performance and practical application. The SLR calls for the collaboration between RE and ML researchers, to tackle the open issues facing the development of real-world ML systems.","PeriodicalId":203944,"journal":{"name":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS47205.2019.9040720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Context: The recent developments made during the last decade or two in requirements engineering (RE) methods have seen a rise in using different machine-learning (ML) algorithms to solve some complex RE problems. One such problem is identifying and classifying software requirements on Stack Overflow (SO). The suitability of ML-based techniques to this tackle problem has shown convincing results, much better than those generated by some traditional natural language processing (NLP) techniques. Nevertheless, a comprehensive and systematic comprehension of these ML based techniques is still deficient. Objective: To identify and classify the type of ML algorithms used for identifying software requirements on SO. Method: This article reports systematic literature review (SLR) gathering evidence published up to August, 2019. Results: This study identified 1073 published papers related to RE and SO. Only 12 primary papers were selected. The data extraction process revealed that; 1) Latent Dirichlet Allocation (LDA) topic modeling is the most widely used ML algorithm in the selected studies, and 2) Precision and recall are the most commonly used evaluation method to measure the performance of these ML algorithms. Conclusion: The SLR finds that while ML algorithms have great potential in the identification of RE on SO, they face some open issues that will ultimately affect their performance and practical application. The SLR calls for the collaboration between RE and ML researchers, to tackle the open issues facing the development of real-world ML systems.
用于识别堆栈溢出软件需求的机器学习算法的经验评估:初步结果
背景:在过去的十年或二十年中,需求工程(RE)方法的最新发展已经看到了使用不同的机器学习(ML)算法来解决一些复杂的需求工程问题的兴起。其中一个问题是识别和分类堆栈溢出(SO)上的软件需求。基于机器学习的技术在这个问题上的适用性已经显示出令人信服的结果,比一些传统的自然语言处理(NLP)技术产生的结果要好得多。然而,对这些基于机器学习的技术的全面和系统的理解仍然不足。目的:识别和分类用于识别SO软件需求的ML算法类型。方法:本文报道截至2019年8月发表的系统性文献综述(SLR)收集证据。结果:本研究检索到与RE和SO相关的已发表论文1073篇。只有12篇主要论文入选。数据提取过程显示;1)潜狄利克雷分配(Latent Dirichlet Allocation, LDA)主题建模是所选研究中使用最广泛的机器学习算法,2)精度和召回率是衡量这些机器学习算法性能的最常用的评价方法。结论:SLR发现,虽然ML算法在识别SO上的RE方面具有很大的潜力,但它们面临一些悬而未决的问题,这些问题最终会影响它们的性能和实际应用。SLR呼吁RE和ML研究人员之间的合作,以解决现实世界ML系统开发面临的开放性问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信