Proceedings of the 29th International Conference on Scientific and Statistical Database Management最新文献_第3页

Fast Equi-Join Algorithms on GPUs: Design and Implementation gpu上的快速等距连接算法:设计与实现

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085521

Ran Rui, Yi-Cheng Tu

{"title":"Fast Equi-Join Algorithms on GPUs: Design and Implementation","authors":"Ran Rui, Yi-Cheng Tu","doi":"10.1145/3085504.3085521","DOIUrl":"https://doi.org/10.1145/3085504.3085521","url":null,"abstract":"Processing relational joins on modern GPUs has attracted much attention in the past few years. With the rapid development on the hardware and software environment in the GPU world, the existing GPU join algorithms designed for earlier architecture cannot make the most out of latest GPU products. In this paper, we report new design and implementation of join algorithms with high performance under today's GPGPU environment. This is a key component of our scientific database engine named G-SDMS. In particular, we overhaul the popular radix hash join and redesign sort-merge join algorithms on GPUs by applying a series of novel techniques to utilize the hardware capacity of latest Nvidia GPU architecture and new features of the CUDA programming framework. Our algorithms take advantage of revised hardware arrangement, larger register file and shared memory, native atomic operation, dynamic parallelism, and CUDA Streams. Experiments show that our new hash join algorithm is 2.0 to 14.6 times as efficient as existing GPU implementation, while the new sort-merge join achieves a speedup of 4.0X to 4.9X. Compared to the best CPU sort-merge join and hash join known to date, our optimized code achieves up to 10.5X and 5.5X speedup. Moreover, we extend our design to scenarios where large data tables cannot fit in the GPU memory.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123957527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Managing Sensor Data Streams: Lessons Learned from the WeBike Project 管理传感器数据流:从WeBike项目中吸取的经验教训

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085505

Christian Gorenflo, Lukasz Golab, S. Keshav

引用次数: 5

FlatFIT: Accelerated Incremental Sliding-Window Aggregation For Real-Time Analytics FlatFIT:加速增量滑动窗口聚合实时分析

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085509

Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis

{"title":"FlatFIT: Accelerated Incremental Sliding-Window Aggregation For Real-Time Analytics","authors":"Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/3085504.3085509","DOIUrl":"https://doi.org/10.1145/3085504.3085509","url":null,"abstract":"Data stream processing is becoming essential in most current advanced scientific or business applications as data production rates are increasing. Different companies compete to efficiently ingest high velocity data and apply some form of computation in order to make better business decisions. In order to successfully compete in this environment, companies are focusing on the most recent data within a count or time-based window by continuously executing aggregate queries on it. Incremental sliding-window computation is commonly used to avoid the performance implications of re-evaluating the aggregate value of the window from scratch on every update. The state-of-the-art FlatFAT technique executes ACQs with high efficiency but it does not scale well with the increasing workloads. In this paper we propose a novel algorithm, FlatFIT, that accelerates such calculations by intelligently maintaining index structures, leading to higher reuse of intermediate calculations and thus exceptional scalability in systems with heavy workloads. Our theoretical analysis shows that FlatFIT is superior in both time and space complexities compared to FlatFAT, while maintaining the same query generality. Given a window of size n, FlatFIT achieves constant algorithmic complexity compared to O(log(n)) complexity of FlatFAT. We experimentally show that FlatFIT achieves up to a 17x throughput improvement over FlatFAT for the same input workload while using less memory.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114482059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Computing Isochrones in Multimodal Spatial Networks using Tile Regions 利用Tile区域计算多模态空间网络中的等时线

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085538

Nikolaus Krismer, Doris Silbernagl, Günther Specht, J. Gamper

{"title":"Computing Isochrones in Multimodal Spatial Networks using Tile Regions","authors":"Nikolaus Krismer, Doris Silbernagl, Günther Specht, J. Gamper","doi":"10.1145/3085504.3085538","DOIUrl":"https://doi.org/10.1145/3085504.3085538","url":null,"abstract":"This paper describes a new method to compute isochrones in multimodal spatial networks, which aims at finding a good trade-off between memory usage and runtime. In the past, approaches based on Dijkstra's algorithm have been proposed. For small networks, the entire network is first loaded in main memory, where the network is expanded to determine the isochrone. For large networks that do not fit in main memory, approaches that load the network vertex-by-vertex during the expansion phase have been proposed. They keep the memory footprint minimal, but have to query the database for each node in the isochrone, which can be very time consuming. The method presented in this paper uses tiles (which are well known from interactive online maps) to realize chunk-loading of vertices by utilizing so-called tile regions. This approach significantly reduces the number of database requests, while keeping the memory usage low. Our method is able to compute isochrones even in large networks at a reasonable time. An experimental evaluation shows that the new algorithm clearly outperforms previous competitive approaches such as MINE and MINEX.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122238725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Location and Processing Aware Datacube Caching 感知位置和处理的数据缓存

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085539

Veranika Liaukevich, D. Misev, P. Baumann, Vlad Merticariu

引用次数: 3

Generating What-If Scenarios for Time Series Data 生成时间序列数据的假设场景

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085507

Lars Kegel, M. Hahmann, Wolfgang Lehner

引用次数: 10

Mining Persistent and Discriminative Communities in Graph Ensembles 图集成中持久和判别社区的挖掘

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085532

Steve Harenberg, Mandar S. Chaudhary, N. Samatova

{"title":"Mining Persistent and Discriminative Communities in Graph Ensembles","authors":"Steve Harenberg, Mandar S. Chaudhary, N. Samatova","doi":"10.1145/3085504.3085532","DOIUrl":"https://doi.org/10.1145/3085504.3085532","url":null,"abstract":"Detecting all communities in a single graph is a prevalent task in graph data analytics. However, many scientific applications naturally create data as an ensemble of graphs. For example, graph ensembles can be created from multiple: social networks at distinct points in time, biological networks created from independent experiments, and global climate networks created from unique climate models. In this work, we present a method for enumerating community subsets across an ensemble of graphs, with the ability to detect both persistent and discriminative subcommunities. Moreover, we support queries, consisting of user-specified vertices of interest and arbitrary ensemble slices, to produce output that is more relevant to the user while reducing output size and computation time. While related methods are designed around a single community definition, our method is designed around the idea that choosing an appropriate community definition often depends on the application at hand. Therefore, our goal is to provide a framework that can leverage the abundance of community detection methods available when discovering persistent and discriminative substructures.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124297249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Definition Digital Elevation Model System Vision Paper 高清数字高程模型系统远景论文

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085533

Andi Zang, Xin Chen, Goce Trajcevski

{"title":"High-Definition Digital Elevation Model System Vision Paper","authors":"Andi Zang, Xin Chen, Goce Trajcevski","doi":"10.1145/3085504.3085533","DOIUrl":"https://doi.org/10.1145/3085504.3085533","url":null,"abstract":"Digital Elevation Modeling (DEM) has been a widely used methodology in plethora of application domains, ranging from climate and geological studies, through temporal evolution of various migration patterns, to Geographic Information Systems (GIS) broadly. However, the existing DEM methodologies and systems cannot quite straightforwardly be extended to catch up with the demands due to recent developments in autonomous driving, vehicle localization, drone and dynamically evolving high-definition smart city modeling. The new challenges are the demand of higher precision, sparse(r) elevation data compression, real-time efficient retrieval and intra-sources data integration. Motivated by this, we take a first step towards developing a tile based, multi-layer high precision DEM system, which aims at seamlessly integrating (and aligning) DEM from different sources, and enables context-driven variations in zoom levels. In addition, to further improve the efficiency of the focused-retrieval of the data necessary to construct the DEM with the desired quality assurance, our vision targets the collaborative compression among heterogeneous data sources.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117324042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs 大型图间中心性近似算法的基准

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085510

Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, Panos Kalnis

{"title":"A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs","authors":"Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, Panos Kalnis","doi":"10.1145/3085504.3085510","DOIUrl":"https://doi.org/10.1145/3085504.3085510","url":null,"abstract":"Betweenness centrality quantifies the importance of graph nodes in a variety of applications including social, biological and communication networks. Its computation is very costly for large graphs; therefore, many approximate methods have been proposed. Given the lack of a golden standard, the accuracy of most approximate methods is evaluated on tiny graphs and is not guaranteed to be representative of realistic datasets that are orders of magnitude larger. In this paper, we develop BeBeCA, a benchmark for betweenness centrality approximation methods on large graphs. Specifically: (i) We generate a golden standard by deploying a parallel implementation of Brandes algorithm using 96,000 CPU cores on a supercomputer to compute exact betweenness centrality values for several large graphs with up to 126M edges. (ii) We propose an evaluation methodology to assess various aspects of approximation accuracy, such as average error and quality of node ranking. (iii) We survey a large number of existing approximation methods and compare their performance and accuracy using our benchmark. (iv) We publicly share our benchmark, which includes the golden standard exact betweenness centrality values together with the scripts that implement our evaluation methodology; for researchers to compare their own algorithms and practitioners to select the appropriate algorithm for their application and data.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Edge Labeling Schemes for Graph Data 图数据边缘标注方案

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI: 10.1145/3085504.3085516

Oshini Goonetilleke, Danai Koutra, T. Sellis, Kewen Liao

引用次数: 9