15年自动化需求追溯研究的数据集:现状、特征和质量

W. Zogaan, Palak Sharma, Mehdi Mirakhorli, V. Arnaoudova
{"title":"15年自动化需求追溯研究的数据集:现状、特征和质量","authors":"W. Zogaan, Palak Sharma, Mehdi Mirakhorli, V. Arnaoudova","doi":"10.1109/RE.2017.80","DOIUrl":null,"url":null,"abstract":"Software datasets play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. The diversity and quality of the datasets within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. This paper presents a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Furthermore, this paper introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used.","PeriodicalId":176958,"journal":{"name":"2017 IEEE 25th International Requirements Engineering Conference (RE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality\",\"authors\":\"W. Zogaan, Palak Sharma, Mehdi Mirakhorli, V. Arnaoudova\",\"doi\":\"10.1109/RE.2017.80\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software datasets play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. The diversity and quality of the datasets within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. This paper presents a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Furthermore, this paper introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used.\",\"PeriodicalId\":176958,\"journal\":{\"name\":\"2017 IEEE 25th International Requirements Engineering Conference (RE)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 25th International Requirements Engineering Conference (RE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RE.2017.80\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE.2017.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

摘要

软件数据集在推进自动化软件可追溯性研究中起着至关重要的作用。研究人员可以以不同的方式使用它们来开发或验证新的自动化方法。一个研究团体内数据集的多样性和质量对结果的准确性、普遍性和可重复性以及研究技术的有用性和实用性有重大影响。收集和评估这些数据集的质量不是一项简单的任务,并且已经被许多软件工程领域的研究人员报道为一个障碍。这篇论文提出了一项史无前例的研究来回顾和评估过去15年来在软件可追溯性研究中使用的数据集。它提出并阐明了这些数据集的现状、它们的特征以及它们对有效性的威胁。此外,本文介绍了一个可追溯性-数据集质量评估(T-DQA)框架,用于对软件可追溯性数据集进行分类,并帮助研究人员根据数据集的不同特征和这些数据集将被使用的背景选择合适的数据集进行研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality
Software datasets play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. The diversity and quality of the datasets within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. This paper presents a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Furthermore, this paper introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信