Why Your Experimental Results Might Be Wrong

F. Schuhknecht, Justus Henneberg
{"title":"Why Your Experimental Results Might Be Wrong","authors":"F. Schuhknecht, Justus Henneberg","doi":"10.1145/3592980.3595317","DOIUrl":null,"url":null,"abstract":"Research projects in the database community are often evaluated based on experimental results. A typical evaluation setup looks as follows: Multiple methods to compare with each other are embedded in a single shared benchmarking codebase. In this codebase, all methods execute an identical workload to collect the individual execution times. This seems reasonable: Since the only difference between individual test runs are the methods themselves, any observed time difference can be attributed to these methods. Also, such a benchmarking codebase can be used for gradual optimization: If one method runs slowly, its code can be optimized and re-evaluated. If its performance improves, this improvement can be attributed to the particular optimization. Unfortunately, we had to learn the hard way that it is not that simple. The reason for this lies in a component that sits right between our benchmarking codebase and the produced experimental results — the compiler. As we will see in the following case study, this black-box component has the power to completely ruin any meaningful comparison between methods, even if we setup our experiments as equal and fair as possible.","PeriodicalId":400127,"journal":{"name":"Proceedings of the 19th International Workshop on Data Management on New Hardware","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592980.3595317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Research projects in the database community are often evaluated based on experimental results. A typical evaluation setup looks as follows: Multiple methods to compare with each other are embedded in a single shared benchmarking codebase. In this codebase, all methods execute an identical workload to collect the individual execution times. This seems reasonable: Since the only difference between individual test runs are the methods themselves, any observed time difference can be attributed to these methods. Also, such a benchmarking codebase can be used for gradual optimization: If one method runs slowly, its code can be optimized and re-evaluated. If its performance improves, this improvement can be attributed to the particular optimization. Unfortunately, we had to learn the hard way that it is not that simple. The reason for this lies in a component that sits right between our benchmarking codebase and the produced experimental results — the compiler. As we will see in the following case study, this black-box component has the power to completely ruin any meaningful comparison between methods, even if we setup our experiments as equal and fair as possible.
为什么你的实验结果可能是错误的
数据库社区中的研究项目通常基于实验结果进行评估。典型的评估设置如下:将多个相互比较的方法嵌入到单个共享基准测试代码库中。在此代码库中,所有方法执行相同的工作负载以收集各个执行时间。这似乎是合理的:因为单个测试运行之间的唯一区别是方法本身,任何观察到的时间差异都可以归因于这些方法。而且,这样的基准代码库可以用于逐步优化:如果一个方法运行缓慢,则可以对其代码进行优化并重新评估。如果它的性能提高了,这种提高可以归因于特定的优化。不幸的是,我们不得不从惨痛的教训中认识到,事情并没有那么简单。原因在于我们的基准测试代码库和生成的实验结果之间有一个组件——编译器。正如我们将在下面的案例研究中看到的那样,这个黑盒组件具有完全破坏方法之间任何有意义的比较的能力,即使我们将实验设置得尽可能平等和公平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信