用深度学习检测需求气味:经验、挑战和未来工作

2021 IEEE 29th International Requirements Engineering Conference Workshops (REW) Pub Date : 2021-08-06 DOI:10.1109/REW53955.2021.00027

Mohammad Kasra Habib, S. Wagner, D. Graziotin

{"title":"用深度学习检测需求气味:经验、挑战和未来工作","authors":"Mohammad Kasra Habib, S. Wagner, D. Graziotin","doi":"10.1109/REW53955.2021.00027","DOIUrl":null,"url":null,"abstract":"Requirements Engineering (RE) is one of the initial phases when building a software system. The success or failure of a software project is firmly tied to this phase, based on communication among stakeholders using natural language. The problem with natural language is that it can easily lead to different understandings if it is not expressed precisely by the stakeholders involved. This results in building a product which is different from the expected one. Previous work proposed to enhance the quality of the software requirements by detecting language errors based on ISO 29148 requirements language criteria. The existing solutions apply classical Natural Language Processing (NLP) to detect them. NLP has some limitations, such as domain dependability which results in poor generalization capability. Therefore, this work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem that is tied with classical NLP and improve precision and recall metrics using a manually labeled dataset. The current findings show that the dataset is unbalanced and which class examples should be added more. It is tempting to train algorithms even if the dataset is not considerably representative. Whence, the results show that models are overfitting; in Machine Learning this issue is adressed by adding more instances to the dataset, improving label quality, removing noise, and reducing the learning algorithms complexity, which is planned for this research.","PeriodicalId":393646,"journal":{"name":"2021 IEEE 29th International Requirements Engineering Conference Workshops (REW)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Detecting Requirements Smells With Deep Learning: Experiences, Challenges and Future Work\",\"authors\":\"Mohammad Kasra Habib, S. Wagner, D. Graziotin\",\"doi\":\"10.1109/REW53955.2021.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Requirements Engineering (RE) is one of the initial phases when building a software system. The success or failure of a software project is firmly tied to this phase, based on communication among stakeholders using natural language. The problem with natural language is that it can easily lead to different understandings if it is not expressed precisely by the stakeholders involved. This results in building a product which is different from the expected one. Previous work proposed to enhance the quality of the software requirements by detecting language errors based on ISO 29148 requirements language criteria. The existing solutions apply classical Natural Language Processing (NLP) to detect them. NLP has some limitations, such as domain dependability which results in poor generalization capability. Therefore, this work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem that is tied with classical NLP and improve precision and recall metrics using a manually labeled dataset. The current findings show that the dataset is unbalanced and which class examples should be added more. It is tempting to train algorithms even if the dataset is not considerably representative. Whence, the results show that models are overfitting; in Machine Learning this issue is adressed by adding more instances to the dataset, improving label quality, removing noise, and reducing the learning algorithms complexity, which is planned for this research.\",\"PeriodicalId\":393646,\"journal\":{\"name\":\"2021 IEEE 29th International Requirements Engineering Conference Workshops (REW)\",\"volume\":\"179 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 29th International Requirements Engineering Conference Workshops (REW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/REW53955.2021.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 29th International Requirements Engineering Conference Workshops (REW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/REW53955.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

需求工程(RE)是构建软件系统的初始阶段之一。基于涉众之间使用自然语言的沟通，软件项目的成功或失败与此阶段紧密相关。自然语言的问题在于，如果相关的利益相关者没有准确地表达它，它很容易导致不同的理解。这将导致构建的产品与预期的产品不同。先前的工作建议基于ISO 29148需求语言准则，通过检测语言错误来提高软件需求的质量。现有的解决方案采用经典的自然语言处理(NLP)来检测它们。自然语言处理存在领域可靠性等局限性，导致其泛化能力较差。因此，本工作旨在通过创建手动标记的数据集并使用集成学习、深度学习(DL)以及词嵌入和迁移学习等技术来改进先前的工作，以克服与经典NLP相关的泛化问题，并使用手动标记的数据集提高精度和召回率指标。目前的研究结果表明，数据集是不平衡的，哪些类的例子应该增加更多。即使数据集不具有相当大的代表性，训练算法也是很诱人的。因此，结果表明模型是过拟合的;在机器学习中，这个问题是通过向数据集添加更多实例、提高标签质量、去除噪声和降低学习算法复杂性来解决的，这是本研究的计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Requirements Smells With Deep Learning: Experiences, Challenges and Future Work

Requirements Engineering (RE) is one of the initial phases when building a software system. The success or failure of a software project is firmly tied to this phase, based on communication among stakeholders using natural language. The problem with natural language is that it can easily lead to different understandings if it is not expressed precisely by the stakeholders involved. This results in building a product which is different from the expected one. Previous work proposed to enhance the quality of the software requirements by detecting language errors based on ISO 29148 requirements language criteria. The existing solutions apply classical Natural Language Processing (NLP) to detect them. NLP has some limitations, such as domain dependability which results in poor generalization capability. Therefore, this work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem that is tied with classical NLP and improve precision and recall metrics using a manually labeled dataset. The current findings show that the dataset is unbalanced and which class examples should be added more. It is tempting to train algorithms even if the dataset is not considerably representative. Whence, the results show that models are overfitting; in Machine Learning this issue is adressed by adding more instances to the dataset, improving label quality, removing noise, and reducing the learning algorithms complexity, which is planned for this research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 29th International Requirements Engineering Conference Workshops (REW)

自引率

0.00%

发文量