Proceedings of the 30th International Conference on Scientific and Statistical Database Management最新文献

GPU-based parallel indexing for concurrent spatial query processing 基于gpu的并发空间查询处理并行索引

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221296

Zhila Nouri, Yi-Cheng Tu

{"title":"GPU-based parallel indexing for concurrent spatial query processing","authors":"Zhila Nouri, Yi-Cheng Tu","doi":"10.1145/3221269.3221296","DOIUrl":"https://doi.org/10.1145/3221269.3221296","url":null,"abstract":"In most spatial database applications, the input data is very large. Previous work has shown the importance of using spatial indexing and parallel computing to speed up such tasks. In recent years, GPUs have become a mainstream platform for massively parallel data processing. On the other hand, due to the complex hardware architecture and programming model, developing programs optimized towards high performance on GPUs is non-trivial, and traditional wisdom geared towards CPU implementations is often found to be ineffective. Recent work on GPU-based spatial indexing focused on parallelizing one individual query at a time. In this paper, we argue that current one-query-at-a-time approach has low work efficiency and cannot make good use of GPU resources. To address such challenges, we present a framework named G-PICS for parallel processing of large number of concurrent spatial queries over big datasets on GPUs. G-PICS is motivated by the fact that many spatial query processing applications are busy systems in which a large number of queries arrive per unit of time. G-PICS encapsulates an efficient parallel algorithm for constructing spatial trees on GPUs and supports major spatial query types such as spatial point search, range search, within-distance search, k-nearest neighbors, and spatial joins. While support for dynamic data inputs missing from existing work, G-PICS provides an efficient parallel update procedure on GPUs. With the query processing, tree construction, and update procedure introduced, G-PICS shows great performance boosts over best-known parallel GPU and parallel CPU-based spatial processing systems.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121680339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Numerically stable parallel computation of (co-)variance (共)方差的数值稳定并行计算

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3223036

Erich Schubert, Michael Gertz

{"title":"Numerically stable parallel computation of (co-)variance","authors":"Erich Schubert, Michael Gertz","doi":"10.1145/3221269.3223036","DOIUrl":"https://doi.org/10.1145/3221269.3223036","url":null,"abstract":"With the advent of big data, we see an increasing interest in computing correlations in huge data sets with both many instances and many variables. Essential descriptive statistics such as the variance, standard deviation, covariance, and correlation can suffer from a numerical instability known as \"catastrophic cancellation\" that can lead to problems when naively computing these statistics with a popular textbook equation. While this instability has been discussed in the literature already 50 years ago, we found that even today, some high-profile tools still employ the instable version. In this paper, we study a popular incremental technique originally proposed by Welford, which we extend to weighted covariance and correlation. We also discuss strategies for further improving numerical precision, how to compute such statistics online on a data stream, with exponential aging, with missing data, and a batch parallelization for both high performance and numerical precision. We demonstrate when the numerical instability arises, and the performance of different approaches under these conditions. We showcase applications from the classic computation of variance as well as advanced applications such as stock market analysis with exponentially weighted moving models and Gaussian mixture modeling for cluster analysis that all benefit from this approach.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122220201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

In-database analytics with ibmdbpy 使用ibmdbpy进行数据库内分析

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3223026

Edouard Fouché, Alexander Eckert, Klemens Böhm

引用次数: 5

Feature-based comparison and generation of time series 基于特征的时间序列比较与生成

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221293

Lars Kegel, M. Hahmann, Wolfgang Lehner

引用次数: 27

Selecting representative and diverse spatio-textual posts over sliding windows 在滑动窗口上选择具有代表性和多样性的空间文本柱子

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221290

Dimitris Sacharidis, Paras Mehta, Dimitrios Skoutas, Kostas Patroumpas, A. Voisard

{"title":"Selecting representative and diverse spatio-textual posts over sliding windows","authors":"Dimitris Sacharidis, Paras Mehta, Dimitrios Skoutas, Kostas Patroumpas, A. Voisard","doi":"10.1145/3221269.3221290","DOIUrl":"https://doi.org/10.1145/3221269.3221290","url":null,"abstract":"Thousands of posts are generated constantly by millions of users in social media, with an increasing portion of this content being geotagged. Keeping track of the whole stream of this spatio-textual content can easily become overwhelming for the user. In this paper, we address the problem of selecting a small, representative and diversified subset of posts, which is continuously updated over a sliding window. Each such subset can be considered as a concise summary of the stream's contents within the respective time interval, being dynamically updated every time the window slides to reflect newly arrived and expired posts. We define the criteria for selecting the contents of each summary, and we present several alternative strategies for summary construction and maintenance that provide different trade-offs between information quality and performance. Furthermore, we optimize the performance of our methods by partitioning the newly arriving posts spatio-textually and computing bounds for the coverage and diversity of the posts in each partition. The proposed methods are evaluated experimentally using real-world datasets containing geotagged tweets and photos.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128746465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Scheduling data-intensive scientific workflows with reduced communication 通过减少通信来调度数据密集型科学工作流

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221298

Ilia Pietri, R. Sakellariou

引用次数: 6

GeoSparkViz

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3223040

Jia Yu, Zongsi Zhang, Mohamed Sarwat

{"title":"GeoSparkViz","authors":"Jia Yu, Zongsi Zhang, Mohamed Sarwat","doi":"10.1145/3221269.3223040","DOIUrl":"https://doi.org/10.1145/3221269.3223040","url":null,"abstract":"Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130871510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

NoSingles NoSingles

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221291

Diana Popova, Naoto Ohsaka, K. Kawarabayashi, Alex Thomo

引用次数: 12

PARADISO PARADISO

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221299

Daniyal Kazempour, A. Beer, Johannes-Y. Lohrer, Daniel Kaltenthaler, T. Seidl

引用次数: 6

Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs 基于顺序无关约束的高斯分布模型因果结构学习

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-07-09 DOI: 10.1145/3221269.3221292

Christopher Schmidt, Johannes Huegle, M. Uflacker

{"title":"Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs","authors":"Christopher Schmidt, Johannes Huegle, M. Uflacker","doi":"10.1145/3221269.3221292","DOIUrl":"https://doi.org/10.1145/3221269.3221292","url":null,"abstract":"Learning the causal structures in high-dimensional datasets allows deriving advanced insights from observational data, thus creating the potential for new applications. One crucial limitation of state-of-the-art methods for learning causal relationships, such as the PC algorithm, is their long execution time. While, in the worst case, the execution time is exponential to the dimension of a given dataset, it is polynomial if the underlying causal structures are sparse. To address the long execution time, parallelized extensions of the algorithm have been developed addressing the Central Processing Unit (CPU) as the primary execution device. While modern multicore CPUs expose a decent level of parallelism, coprocessors, such as Graphics Processing Units (GPUs), are specifically designed to process thousands of data points in parallel, providing superior parallel processing capabilities compared to CPUs. In our work, we leverage the parallel processing power of GPUs to address the drawback of the long execution time of the PC algorithm and develop an efficient GPU-accelerated implementation for Gaussian distribution models. Based on an experimental evaluation of various high-dimensional real-world gene expression datasets, we show that our GPU-accelerated implementation outperforms existing CPU-based versions, by factors up to 700.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15