M. V. Mehren, F. Gieseke, J. Verbesselt, Sabina Roşca, S. Horion, A. Zeileis
{"title":"Massively-parallel break detection for satellite data","authors":"M. V. Mehren, F. Gieseke, J. Verbesselt, Sabina Roşca, S. Horion, A. Zeileis","doi":"10.1145/3221269.3223032","DOIUrl":"https://doi.org/10.1145/3221269.3223032","url":null,"abstract":"The field of remote sensing is nowadays faced with huge amounts of data. While this offers a variety of exciting research opportunities, it also yields significant challenges regarding both computation time and space requirements. In practice, the sheer data volumes render existing approaches too slow for processing and analyzing all the available data. This work aims at accelerating BFAST, one of the state-of-the-art methods for break detection given satellite image time series. In particular, we propose a massively-parallel implementation for BFAST that can effectively make use of modern parallel compute devices such as GPUs. Our experimental evaluation shows that the proposed GPU implementation is up to four orders of magnitude faster than the existing publicly available implementation and up to ten times faster than a corresponding multi-threaded CPU execution. The dramatic decrease in running time renders the analysis of significantly larger datasets possible in seconds or minutes instead of hours or days. We demonstrate the practical benefits of our implementations given both artificial and real datasets.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127188567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multidimensional range queries on modern hardware","authors":"Stefan Sprenger, Patrick Schäfer, U. Leser","doi":"10.1145/3221269.3223031","DOIUrl":"https://doi.org/10.1145/3221269.3223031","url":null,"abstract":"Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SBG-sketch: a self-balanced sketch for labeled-graph stream summarization","authors":"Mohamed S. Hassan, Bruno Ribeiro, Walid G. Aref","doi":"10.1145/3221269.3223030","DOIUrl":"https://doi.org/10.1145/3221269.3223030","url":null,"abstract":"Applications in various domains rely on processing graph streams, e.g., communication logs of a cloud-troubleshooting system, road-network traffic updates, and interactions on a social network. A labeled-graph stream refers to a sequence of streamed edges of distinct types that form a labeled graph. Due to the large volume and high velocity of these streams, it is often more practical to incrementally build a lossy-compressed version of the graph, and use this lossy version to approximately evaluate graph queries. Challenges arise when the queries are unknown in advance but are associated with filtering predicates based on edge labels. Surprisingly common, and especially challenging, are labeled-graph streams that have highly skewed and unpredictable label-distributions. This paper introduces Self-Balanced Graph Sketch (SBG-Sketch, for short), a graph sketch for summarizing and querying labeled-graph streams, coping with highly imbalanced labels. SBG-Sketch maintains synopsis for both the edge attributes as well as the topology of the streamed graph. SBG-Sketch allows efficient processing of traversal queries, e.g., reachability queries. Experimental results over a variety of real labeled-graph streams show SBG-Sketch to reduce the estimation errors of state-of-the-art methods by up to 99%.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}