使用深度学习技术在语义上增强软件可追溯性

Jin Guo, Jinghui Cheng, J. Cleland-Huang
{"title":"使用深度学习技术在语义上增强软件可追溯性","authors":"Jin Guo, Jinghui Cheng, J. Cleland-Huang","doi":"10.1109/ICSE.2017.9","DOIUrl":null,"url":null,"abstract":"In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.","PeriodicalId":6505,"journal":{"name":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","volume":"11 1","pages":"3-14"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"204","resultStr":"{\"title\":\"Semantically Enhanced Software Traceability Using Deep Learning Techniques\",\"authors\":\"Jin Guo, Jinghui Cheng, J. Cleland-Huang\",\"doi\":\"10.1109/ICSE.2017.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.\",\"PeriodicalId\":6505,\"journal\":{\"name\":\"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)\",\"volume\":\"11 1\",\"pages\":\"3-14\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"204\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE.2017.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2017.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 204

摘要

在大多数安全关键领域,对可追溯性的需求是由认证机构规定的。跟踪链接通常是在需求、设计、源代码、测试用例和其他工件之间创建的,然而,手动创建这样的链接既耗时又容易出错。自动化解决方案使用信息检索和机器学习技术来生成跟踪链接,然而,当前的技术无法理解软件工件的语义或将领域知识集成到跟踪过程中,因此倾向于提供不精确和不准确的结果。在本文中,我们提出了一个使用深度学习将需求工件语义和领域知识合并到跟踪解决方案中的解决方案。我们提出了一种利用词嵌入和递归神经网络(RNN)模型来生成跟踪链接的跟踪网络架构。词嵌入学习表示领域语料库知识的词向量,RNN使用这些词向量来学习需求工件的句子语义。我们使用正列车控制域中现有的跟踪链路训练了360种不同的跟踪网络配置,并确定双向门控循环单元(BI-GRU)是跟踪任务的最佳模型。BI-GRU显著优于最先进的跟踪方法,包括向量空间模型和潜在语义索引。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semantically Enhanced Software Traceability Using Deep Learning Techniques
In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信