Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality

2017 IEEE 25th International Requirements Engineering Conference (RE) Pub Date : 2017-09-01 DOI:10.1109/RE.2017.80

W. Zogaan, Palak Sharma, Mehdi Mirakhorli, V. Arnaoudova

{"title":"Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality","authors":"W. Zogaan, Palak Sharma, Mehdi Mirakhorli, V. Arnaoudova","doi":"10.1109/RE.2017.80","DOIUrl":null,"url":null,"abstract":"Software datasets play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. The diversity and quality of the datasets within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. This paper presents a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Furthermore, this paper introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used.","PeriodicalId":176958,"journal":{"name":"2017 IEEE 25th International Requirements Engineering Conference (RE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE.2017.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Software datasets play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. The diversity and quality of the datasets within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. This paper presents a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Furthermore, this paper introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used.

查看原文本刊更多论文

15年自动化需求追溯研究的数据集:现状、特征和质量

软件数据集在推进自动化软件可追溯性研究中起着至关重要的作用。研究人员可以以不同的方式使用它们来开发或验证新的自动化方法。一个研究团体内数据集的多样性和质量对结果的准确性、普遍性和可重复性以及研究技术的有用性和实用性有重大影响。收集和评估这些数据集的质量不是一项简单的任务，并且已经被许多软件工程领域的研究人员报道为一个障碍。这篇论文提出了一项史无前例的研究来回顾和评估过去15年来在软件可追溯性研究中使用的数据集。它提出并阐明了这些数据集的现状、它们的特征以及它们对有效性的威胁。此外，本文介绍了一个可追溯性-数据集质量评估(T-DQA)框架，用于对软件可追溯性数据集进行分类，并帮助研究人员根据数据集的不同特征和这些数据集将被使用的背景选择合适的数据集进行研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 25th International Requirements Engineering Conference (RE)

自引率

0.00%

发文量