基于Triscale的可复制网络实验设计

J. Syst. Res. Pub Date : 2021-09-22 DOI:10.5070/sr31155408

Romain Jacob, Marco Zimmerling, C. Boano, L. Vanbever, L. Thiele

{"title":"基于Triscale的可复制网络实验设计","authors":"Romain Jacob, Marco Zimmerling, C. Boano, L. Vanbever, L. Thiele","doi":"10.5070/sr31155408","DOIUrl":null,"url":null,"abstract":"When designing their performance evaluations, networking researchers often encounter questions such as: How long should a run be? How many runs to perform? How to account for the variability across multiple runs? What statistical methods should be used to analyze the data? Despite their best intentions, researchers often answer these questions dif-ferently, thus impairing the replicability of their evaluations and the conﬁdence in their results. In this paper, we propose a concrete methodology for the design and analysis of performance evaluations. Our approach hierarchically partitions the performance evaluation into three timescales, following the principle of separation of concerns. The idea is to understand, for each timescale, the temporal characteristics of variability sources, and then to apply rigorous statistical methods to derive performance results with quantiﬁable conﬁdence in spite of the inherent variability. We implement this methodology in a software framework called TriScale . For each performance metric, TriScale computes a variability score that estimates, with a given conﬁdence, how similar the results would be if the evaluation were repli-cated; in other words, TriScale quantiﬁes the replicability of evaluations. We showcase the practicality and usefulness of TriScale on four different case studies demonstrating that TriScale helps to generalize and strengthen published results. Improving the standards of replicability in networking is a complex challenge. This paper is an important contribution to this endeavor; it provides networking researchers with a rational and concrete experimental methodology rooted in sound statistical foundations. The ﬁrst of its kind.","PeriodicalId":363427,"journal":{"name":"J. Syst. Res.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"[Tool] Designing Replicable Networking Experiments With Triscale\",\"authors\":\"Romain Jacob, Marco Zimmerling, C. Boano, L. Vanbever, L. Thiele\",\"doi\":\"10.5070/sr31155408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When designing their performance evaluations, networking researchers often encounter questions such as: How long should a run be? How many runs to perform? How to account for the variability across multiple runs? What statistical methods should be used to analyze the data? Despite their best intentions, researchers often answer these questions dif-ferently, thus impairing the replicability of their evaluations and the conﬁdence in their results. In this paper, we propose a concrete methodology for the design and analysis of performance evaluations. Our approach hierarchically partitions the performance evaluation into three timescales, following the principle of separation of concerns. The idea is to understand, for each timescale, the temporal characteristics of variability sources, and then to apply rigorous statistical methods to derive performance results with quantiﬁable conﬁdence in spite of the inherent variability. We implement this methodology in a software framework called TriScale . For each performance metric, TriScale computes a variability score that estimates, with a given conﬁdence, how similar the results would be if the evaluation were repli-cated; in other words, TriScale quantiﬁes the replicability of evaluations. We showcase the practicality and usefulness of TriScale on four different case studies demonstrating that TriScale helps to generalize and strengthen published results. Improving the standards of replicability in networking is a complex challenge. This paper is an important contribution to this endeavor; it provides networking researchers with a rational and concrete experimental methodology rooted in sound statistical foundations. The ﬁrst of its kind.\",\"PeriodicalId\":363427,\"journal\":{\"name\":\"J. Syst. Res.\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Syst. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5070/sr31155408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Syst. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5070/sr31155408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在设计他们的绩效评估时，网络研究人员经常遇到这样的问题:跑步应该多长时间?要运行多少次?如何解释多次运行的可变性?应该使用什么统计方法来分析数据?尽管他们的出发点是好的，但研究人员经常以不同的方式回答这些问题，从而损害了他们评估的可重复性和对结果的信心。在本文中，我们提出了一个具体的方法来设计和分析绩效评估。我们的方法遵循关注点分离的原则，将绩效评估分层划分为三个时间尺度。我们的想法是理解，对于每个时间尺度，变异性来源的时间特征，然后应用严格的统计方法，以可量化的信心推导性能结果，尽管固有的变异性。我们在一个叫做TriScale的软件框架中实现了这个方法。对于每个性能指标，TriScale计算一个可变性分数，以给定的置信度估计如果重复评估结果的相似程度;换句话说，TriScale量化了评估的可重复性。我们在四个不同的案例研究中展示了TriScale的实用性和有用性，表明TriScale有助于推广和加强已发表的结果。提高网络中可复制性的标准是一项复杂的挑战。本文是对这一努力的重要贡献;它为网络研究人员提供了一种基于可靠统计基础的合理而具体的实验方法。这是第一次。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

[Tool] Designing Replicable Networking Experiments With Triscale

When designing their performance evaluations, networking researchers often encounter questions such as: How long should a run be? How many runs to perform? How to account for the variability across multiple runs? What statistical methods should be used to analyze the data? Despite their best intentions, researchers often answer these questions dif-ferently, thus impairing the replicability of their evaluations and the conﬁdence in their results. In this paper, we propose a concrete methodology for the design and analysis of performance evaluations. Our approach hierarchically partitions the performance evaluation into three timescales, following the principle of separation of concerns. The idea is to understand, for each timescale, the temporal characteristics of variability sources, and then to apply rigorous statistical methods to derive performance results with quantiﬁable conﬁdence in spite of the inherent variability. We implement this methodology in a software framework called TriScale . For each performance metric, TriScale computes a variability score that estimates, with a given conﬁdence, how similar the results would be if the evaluation were repli-cated; in other words, TriScale quantiﬁes the replicability of evaluations. We showcase the practicality and usefulness of TriScale on four different case studies demonstrating that TriScale helps to generalize and strengthen published results. Improving the standards of replicability in networking is a complex challenge. This paper is an important contribution to this endeavor; it provides networking researchers with a rational and concrete experimental methodology rooted in sound statistical foundations. The ﬁrst of its kind.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Syst. Res.

自引率

0.00%

发文量