{"title":"Fast Equi-Join Algorithms on GPUs: Design and Implementation","authors":"Ran Rui, Yi-Cheng Tu","doi":"10.1145/3085504.3085521","DOIUrl":"https://doi.org/10.1145/3085504.3085521","url":null,"abstract":"Processing relational joins on modern GPUs has attracted much attention in the past few years. With the rapid development on the hardware and software environment in the GPU world, the existing GPU join algorithms designed for earlier architecture cannot make the most out of latest GPU products. In this paper, we report new design and implementation of join algorithms with high performance under today's GPGPU environment. This is a key component of our scientific database engine named G-SDMS. In particular, we overhaul the popular radix hash join and redesign sort-merge join algorithms on GPUs by applying a series of novel techniques to utilize the hardware capacity of latest Nvidia GPU architecture and new features of the CUDA programming framework. Our algorithms take advantage of revised hardware arrangement, larger register file and shared memory, native atomic operation, dynamic parallelism, and CUDA Streams. Experiments show that our new hash join algorithm is 2.0 to 14.6 times as efficient as existing GPU implementation, while the new sort-merge join achieves a speedup of 4.0X to 4.9X. Compared to the best CPU sort-merge join and hash join known to date, our optimized code achieves up to 10.5X and 5.5X speedup. Moreover, we extend our design to scenarios where large data tables cannot fit in the GPU memory.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123957527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing Sensor Data Streams: Lessons Learned from the WeBike Project","authors":"Christian Gorenflo, Lukasz Golab, S. Keshav","doi":"10.1145/3085504.3085505","DOIUrl":"https://doi.org/10.1145/3085504.3085505","url":null,"abstract":"We present insights on data management resulting from a field deployment of approximately 30 sensor-equipped electric bicycles (e-bikes) at the University of Waterloo. The trial has been in operation for the last two-and-a-half years, and we have collected and analyzed more than 150 gigabytes of data. We discuss best practices for the entire data management process, spanning data collection, extract-transform-load, data cleaning, and choosing a suitable data management ecosystem. We also comment on how our experiences will inform the design of a future large-scale field trial involving several thousand fully-instrumented e-bikes.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124405894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis
{"title":"FlatFIT: Accelerated Incremental Sliding-Window Aggregation For Real-Time Analytics","authors":"Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/3085504.3085509","DOIUrl":"https://doi.org/10.1145/3085504.3085509","url":null,"abstract":"Data stream processing is becoming essential in most current advanced scientific or business applications as data production rates are increasing. Different companies compete to efficiently ingest high velocity data and apply some form of computation in order to make better business decisions. In order to successfully compete in this environment, companies are focusing on the most recent data within a count or time-based window by continuously executing aggregate queries on it. Incremental sliding-window computation is commonly used to avoid the performance implications of re-evaluating the aggregate value of the window from scratch on every update. The state-of-the-art FlatFAT technique executes ACQs with high efficiency but it does not scale well with the increasing workloads. In this paper we propose a novel algorithm, FlatFIT, that accelerates such calculations by intelligently maintaining index structures, leading to higher reuse of intermediate calculations and thus exceptional scalability in systems with heavy workloads. Our theoretical analysis shows that FlatFIT is superior in both time and space complexities compared to FlatFAT, while maintaining the same query generality. Given a window of size n, FlatFIT achieves constant algorithmic complexity compared to O(log(n)) complexity of FlatFAT. We experimentally show that FlatFIT achieves up to a 17x throughput improvement over FlatFAT for the same input workload while using less memory.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114482059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikolaus Krismer, Doris Silbernagl, Günther Specht, J. Gamper
{"title":"Computing Isochrones in Multimodal Spatial Networks using Tile Regions","authors":"Nikolaus Krismer, Doris Silbernagl, Günther Specht, J. Gamper","doi":"10.1145/3085504.3085538","DOIUrl":"https://doi.org/10.1145/3085504.3085538","url":null,"abstract":"This paper describes a new method to compute isochrones in multimodal spatial networks, which aims at finding a good trade-off between memory usage and runtime. In the past, approaches based on Dijkstra's algorithm have been proposed. For small networks, the entire network is first loaded in main memory, where the network is expanded to determine the isochrone. For large networks that do not fit in main memory, approaches that load the network vertex-by-vertex during the expansion phase have been proposed. They keep the memory footprint minimal, but have to query the database for each node in the isochrone, which can be very time consuming. The method presented in this paper uses tiles (which are well known from interactive online maps) to realize chunk-loading of vertices by utilizing so-called tile regions. This approach significantly reduces the number of database requests, while keeping the memory usage low. Our method is able to compute isochrones even in large networks at a reasonable time. An experimental evaluation shows that the new algorithm clearly outperforms previous competitive approaches such as MINE and MINEX.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122238725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Veranika Liaukevich, D. Misev, P. Baumann, Vlad Merticariu
{"title":"Location and Processing Aware Datacube Caching","authors":"Veranika Liaukevich, D. Misev, P. Baumann, Vlad Merticariu","doi":"10.1145/3085504.3085539","DOIUrl":"https://doi.org/10.1145/3085504.3085539","url":null,"abstract":"Array databases are used to manage and query large N-dimensional arrays, such as sensor data, simulation models and imagery, as well as various time-series. Modern database systems and database applications make extensive use of caching techniques to improve performance. Research on array databases on the other hand has not explored the potential benefits of caching in query processing on big arrays. In this work we propose a design for a content-aware cache for array databases which allows to reuse results of previously evaluated queries. Besides identical query matching, our method also takes into account spatially overlapping queries and queries with common subexpressions. We evaluate performance of the query cache implementation by varying data and query parameters and show that it decreases query execution time by up to 93%, with a potential for even higher savings with increasing query complexity.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114659452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating What-If Scenarios for Time Series Data","authors":"Lars Kegel, M. Hahmann, Wolfgang Lehner","doi":"10.1145/3085504.3085507","DOIUrl":"https://doi.org/10.1145/3085504.3085507","url":null,"abstract":"Time series data has become a ubiquitous and important data source in many application domains. Most companies and organizations strongly rely on this data for critical tasks like decision-making, planning, predictions, and analytics in general. While all these tasks generally focus on actual data representing organization and business processes, it is also desirable to apply them to alternative scenarios in order to prepare for developments that diverge from expectations or assess the robustness of current strategies. When it comes to the construction of such what-if scenarios, existing tools either focus on scalar data or they address highly specific scenarios. In this work, we propose a generally applicable and easy-to-use method for the generation of what-if scenarios on time series data. Our approach extracts descriptive features of a data set and allows the construction of an alternate version by means of filtering and modification of these features.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130265937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Persistent and Discriminative Communities in Graph Ensembles","authors":"Steve Harenberg, Mandar S. Chaudhary, N. Samatova","doi":"10.1145/3085504.3085532","DOIUrl":"https://doi.org/10.1145/3085504.3085532","url":null,"abstract":"Detecting all communities in a single graph is a prevalent task in graph data analytics. However, many scientific applications naturally create data as an ensemble of graphs. For example, graph ensembles can be created from multiple: social networks at distinct points in time, biological networks created from independent experiments, and global climate networks created from unique climate models. In this work, we present a method for enumerating community subsets across an ensemble of graphs, with the ability to detect both persistent and discriminative subcommunities. Moreover, we support queries, consisting of user-specified vertices of interest and arbitrary ensemble slices, to produce output that is more relevant to the user while reducing output size and computation time. While related methods are designed around a single community definition, our method is designed around the idea that choosing an appropriate community definition often depends on the application at hand. Therefore, our goal is to provide a framework that can leverage the abundance of community detection methods available when discovering persistent and discriminative substructures.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124297249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Definition Digital Elevation Model System Vision Paper","authors":"Andi Zang, Xin Chen, Goce Trajcevski","doi":"10.1145/3085504.3085533","DOIUrl":"https://doi.org/10.1145/3085504.3085533","url":null,"abstract":"Digital Elevation Modeling (DEM) has been a widely used methodology in plethora of application domains, ranging from climate and geological studies, through temporal evolution of various migration patterns, to Geographic Information Systems (GIS) broadly. However, the existing DEM methodologies and systems cannot quite straightforwardly be extended to catch up with the demands due to recent developments in autonomous driving, vehicle localization, drone and dynamically evolving high-definition smart city modeling. The new challenges are the demand of higher precision, sparse(r) elevation data compression, real-time efficient retrieval and intra-sources data integration. Motivated by this, we take a first step towards developing a tile based, multi-layer high precision DEM system, which aims at seamlessly integrating (and aligning) DEM from different sources, and enables context-driven variations in zoom levels. In addition, to further improve the efficiency of the focused-retrieval of the data necessary to construct the DEM with the desired quality assurance, our vision targets the collaborative compression among heterogeneous data sources.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117324042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs","authors":"Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, Panos Kalnis","doi":"10.1145/3085504.3085510","DOIUrl":"https://doi.org/10.1145/3085504.3085510","url":null,"abstract":"Betweenness centrality quantifies the importance of graph nodes in a variety of applications including social, biological and communication networks. Its computation is very costly for large graphs; therefore, many approximate methods have been proposed. Given the lack of a golden standard, the accuracy of most approximate methods is evaluated on tiny graphs and is not guaranteed to be representative of realistic datasets that are orders of magnitude larger. In this paper, we develop BeBeCA, a benchmark for betweenness centrality approximation methods on large graphs. Specifically: (i) We generate a golden standard by deploying a parallel implementation of Brandes algorithm using 96,000 CPU cores on a supercomputer to compute exact betweenness centrality values for several large graphs with up to 126M edges. (ii) We propose an evaluation methodology to assess various aspects of approximation accuracy, such as average error and quality of node ranking. (iii) We survey a large number of existing approximation methods and compare their performance and accuracy using our benchmark. (iv) We publicly share our benchmark, which includes the golden standard exact betweenness centrality values together with the scripts that implement our evaluation methodology; for researchers to compare their own algorithms and practitioners to select the appropriate algorithm for their application and data.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oshini Goonetilleke, Danai Koutra, T. Sellis, Kewen Liao
{"title":"Edge Labeling Schemes for Graph Data","authors":"Oshini Goonetilleke, Danai Koutra, T. Sellis, Kewen Liao","doi":"10.1145/3085504.3085516","DOIUrl":"https://doi.org/10.1145/3085504.3085516","url":null,"abstract":"Given a directed graph, how should we label both its outgoing and incoming edges to achieve better disk locality and support neighborhood-related edge queries? In this paper, we answer this question with edge-labeling schemes GrdRandom and FlipInOut, to label edges with integers based on the premise that edges should be assigned integer identifiers exploiting their consecutiveness to a maximum degree. We provide extensive experimental analysis on real-world graphs, and compare our proposed schemes with other labeling methods based on assigning edge IDs in the order of insertion or even randomly, as traditionally done. We show that our methods are efficient and result in significantly improved query I/O performance, including with indexes built on directed attributed edges. This ultimately leads to faster execution of neighborhood-related edge queries.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133070583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}