Orderings of Data - More Than a Tripping Hazard: Visionary

32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI:10.1145/3400903.3400911

A. Beer, Valentin Hartmann, T. Seidl

引用次数: 0

Abstract

As data processing techniques get more and more sophisticated every day, many of us researchers often get lost in the details and subtleties of the algorithms we are developing and far too easily seem to forget to look also at the very first steps of every algorithm: the input of the data. Since there are plenty of library functions for this task, we indeed do not have to think about this part of the pipeline anymore. But maybe we should. All data is stored and loaded into a program in some order. In this vision paper we study how ignoring this order can (1) lead to performance issues and (2) make research results unreproducible. We furthermore examine desirable properties of a data ordering and why current approaches are often not suited to tackle the two mentioned problems.

查看原文本刊更多论文

数据排序-不仅仅是一个绊倒的危险:有远见的

随着数据处理技术变得越来越复杂，我们中的许多研究人员经常迷失在我们正在开发的算法的细节和微妙之处，似乎太容易忘记查看每个算法的第一步:数据的输入。由于有大量的库函数用于此任务，因此我们确实不必再考虑管道的这一部分。但也许我们应该。所有的数据都以某种顺序存储和加载到程序中。在这篇远景论文中，我们研究了忽略这个顺序如何导致(1)性能问题和(2)使研究结果不可复制。我们进一步研究了数据排序的理想属性，以及为什么当前的方法通常不适合解决上述两个问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

32nd International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量