Evaluation Methods and Replicability of Software Architecture Research Objects

2022 IEEE 19th International Conference on Software Architecture (ICSA) Pub Date : 2022-03-01 DOI:10.1109/ICSA53651.2022.00023

M. Konersmann, Angelika Kaplan, Thomas Kühn, R. Heinrich, A. Koziolek, R. Reussner, J. Jürjens, Mahmood S. Al-Doori, Nicolas Boltz, Marco Ehl, Dominik Fuchß, Katharina Groser, Sebastian Hahner, Jan Keim, Matthias Lohr, Timur Saglam, Sophie Schulz, Jan-Philipp Töberg

{"title":"Evaluation Methods and Replicability of Software Architecture Research Objects","authors":"M. Konersmann, Angelika Kaplan, Thomas Kühn, R. Heinrich, A. Koziolek, R. Reussner, J. Jürjens, Mahmood S. Al-Doori, Nicolas Boltz, Marco Ehl, Dominik Fuchß, Katharina Groser, Sebastian Hahner, Jan Keim, Matthias Lohr, Timur Saglam, Sophie Schulz, Jan-Philipp Töberg","doi":"10.1109/ICSA53651.2022.00023","DOIUrl":null,"url":null,"abstract":"Context: Software architecture (SA) as research area experienced an increase in empirical research, as identified by Galster and Weyns in 2016 [1]. Empirical research builds a sound foundation for the validity and comparability of the research. A current overview on the evaluation and replicability of SA research objects could help to discuss our empirical standards as a community. However, no such current overview exists.Objective: We aim at assessing the current state of practice of evaluating SA research objects and replication artifact provision in full technical conference papers from 2017 to 2021.Method: We first create a categorization of papers regarding their evaluation and provision of replication artifacts. In a systematic literature review (SLR) with 153 papers we then investigate how SA research objects are evaluated and how artifacts are made available.Results: We found that technical experiments (28%) and case studies (29%) are the most frequently used evaluation methods over all research objects. Functional suitability (46% of evaluated properties) and performance (29%) are the most evaluated properties. 17 papers (11%) provide replication packages and 97 papers (63%) explicitly state threats to validity. 17% of papers reference guidelines for evaluations and 14% of papers reference guidelines for threats to validity.Conclusions: Our results indicate that the generalizability and repeatability of evaluations could be improved to enhance the maturity of the field; although, there are valid reasons for contributions to not publish their data. We derive from our findings a set of four proposals for improving the state of practice in evaluating software architecture research objects. Researchers can use our results to find recommendations on relevant properties to evaluate and evaluation methods to use and to identify reusable evaluation artifacts to compare their novel ideas with other research. Reviewers can use our results to compare the evaluation and replicability of submissions with the state of the practice.","PeriodicalId":179123,"journal":{"name":"2022 IEEE 19th International Conference on Software Architecture (ICSA)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 19th International Conference on Software Architecture (ICSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSA53651.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Context: Software architecture (SA) as research area experienced an increase in empirical research, as identified by Galster and Weyns in 2016 [1]. Empirical research builds a sound foundation for the validity and comparability of the research. A current overview on the evaluation and replicability of SA research objects could help to discuss our empirical standards as a community. However, no such current overview exists.Objective: We aim at assessing the current state of practice of evaluating SA research objects and replication artifact provision in full technical conference papers from 2017 to 2021.Method: We first create a categorization of papers regarding their evaluation and provision of replication artifacts. In a systematic literature review (SLR) with 153 papers we then investigate how SA research objects are evaluated and how artifacts are made available.Results: We found that technical experiments (28%) and case studies (29%) are the most frequently used evaluation methods over all research objects. Functional suitability (46% of evaluated properties) and performance (29%) are the most evaluated properties. 17 papers (11%) provide replication packages and 97 papers (63%) explicitly state threats to validity. 17% of papers reference guidelines for evaluations and 14% of papers reference guidelines for threats to validity.Conclusions: Our results indicate that the generalizability and repeatability of evaluations could be improved to enhance the maturity of the field; although, there are valid reasons for contributions to not publish their data. We derive from our findings a set of four proposals for improving the state of practice in evaluating software architecture research objects. Researchers can use our results to find recommendations on relevant properties to evaluate and evaluation methods to use and to identify reusable evaluation artifacts to compare their novel ideas with other research. Reviewers can use our results to compare the evaluation and replicability of submissions with the state of the practice.

查看原文本刊更多论文

软件体系结构研究对象的评价方法与可复制性

背景:Galster和Weyns在2016年发现，软件架构(SA)作为研究领域的实证研究有所增加[1]。实证研究为研究的有效性和可比性奠定了良好的基础。当前对SA研究对象的评价和可复制性的概述有助于我们作为一个社区讨论我们的经验标准。然而，目前还没有这样的概述。目的:我们旨在评估2017年至2021年技术会议论文中评估SA研究对象和复制工件提供的实践现状。方法:我们首先创建一个关于他们的评估和提供复制工件的论文分类。在153篇论文的系统文献综述(SLR)中，我们调查了SA研究对象是如何评估的，以及人工制品是如何获得的。结果:我们发现技术实验(28%)和案例研究(29%)是所有研究对象中最常用的评估方法。功能适用性(46%的评估属性)和性能(29%)是评估最多的属性。17篇论文(11%)提供了复制包，97篇论文(63%)明确指出了对有效性的威胁。17%的论文参考了评价指南，14%的论文参考了效度威胁指南。结论:评价的普遍性和可重复性有待提高，以增强该领域的成熟度;尽管如此，贡献者有正当的理由不公布他们的数据。我们从我们的发现中得出了一组四项建议，用于改进评估软件架构研究对象的实践状态。研究人员可以使用我们的结果来找到有关评估和评估方法的相关属性的建议，并确定可重用的评估工件，以将他们的新想法与其他研究进行比较。审稿人可以使用我们的结果来比较提交的评估和可复制性与实践的状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 19th International Conference on Software Architecture (ICSA)

自引率

0.00%

发文量