33rd International Conference on Scientific and Statistical Database Management最新文献

What is special about spatial data science and Geo-AI? 空间数据科学和地理人工智能有什么特别之处?

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3472263

S. Shekhar

{"title":"What is special about spatial data science and Geo-AI?","authors":"S. Shekhar","doi":"10.1145/3468791.3472263","DOIUrl":"https://doi.org/10.1145/3468791.3472263","url":null,"abstract":"The importance of spatial data science and Geo-AI is growing with the rise of spatial and spatiotemporal big data (e.g., trajectories, remote-sensing images, census and geo-social media) [1-2]. Societal use cases include Agriculture (global crop monitoring, precision agriculture), Location-based services (e.g., navigation, ride-sharing), Public Health (e.g., monitoring disease spread), Environment and Climate (change detection, land-cover classification), Smart Cities (e.g., mapping buildings), etc. [1-2] Classical data science and AI (e.g., machine learning) often perform poorly when applied to spatial data sets because of the many reasons [1-5]. First, spatial data is embedded in a continuous space and classical statistics (e.g., correlation) are not robust to the modifiable areal unit problem. Second, spatial data-items have extended footprints (e.g., line strings, polygons) and implicit relationships (e.g., distance, touch). Third, high cost of spurious patterns requires guardrails (e.g., statistical significance tests) to reduce false positives. Furthermore, spatial autocorrelation and variability violate the classical assumption of data samples being generated independently from identical distributions, which risk models that are either inaccurate or inconsistent with the data. Thus, new methods are needed to analyze spatial data [1-5]. This talk surveys common and emerging methods for spatial classification and prediction (e.g., spatial autoregression, spatial decision trees [6], spatial variability aware neural networks [7]), as well as techniques for discovering interesting, useful and non-trivial patterns such as hotspots (e.g., circular, linear, arbitrary shapes [8]), interactions (e.g., co-locations [9], tele-connections), spatial outliers [10], and their spatio-temporal counterparts [3].","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116648806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Caching Support for Range Query Processing on Bitmap Indices 位图索引上范围查询处理的缓存支持

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468800

Sarah McClain, Manya Mutschler-Aldine, C. Monaghan, David Chiu, Jason Sawin, Patrick Jarvis

{"title":"Caching Support for Range Query Processing on Bitmap Indices","authors":"Sarah McClain, Manya Mutschler-Aldine, C. Monaghan, David Chiu, Jason Sawin, Patrick Jarvis","doi":"10.1145/3468791.3468800","DOIUrl":"https://doi.org/10.1145/3468791.3468800","url":null,"abstract":"Bitmaps are commonly used for indexing read-mostly data sets. The range of an attribute is split into bins, where its values are placed: bij = 1 denotes the value of the ith tuple is in the jth bin, and bij = 0 otherwise. A number of query types can be decomposed into the systematic application of boolean operators over sets of bins. However, when bitmaps are high-dimensional, the overall query-processing performance can deteriorate due to the increased number of bins that participate per query. We propose a caching framework that organizes, manages, and integrates cached partial results to accelerate query processing on high-dimensional bitmaps. We begin by showing that, to resolve general complex disjunctive and conjunctive queries, the selection of an optimal set of partial bitmap results is NP-complete. A restriction on this problem to only consider consecutive bin sequences (characteristic of common range and point queries) allows us to solve it efficiently. The evaluation our caching system over several workloads carried out on the TPC-H benchmark and a real network-intrusion data set is presented.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114281925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MISE: An Array-Based Integrated System for Atmospheric Scanning LiDAR 基于阵列的大气扫描激光雷达集成系统

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468829

Kyoseung Koo, Juhun Kim, Bongki Moon

引用次数: 1

MAMBO - Indexing Dead Space to Accelerate Spatial Queries✱ MAMBO -索引死空间以加速空间查询

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468804

Giannis Evagorou, T. Heinis

{"title":"MAMBO - Indexing Dead Space to Accelerate Spatial Queries✱","authors":"Giannis Evagorou, T. Heinis","doi":"10.1145/3468791.3468804","DOIUrl":"https://doi.org/10.1145/3468791.3468804","url":null,"abstract":"With the increasing size and prevalence of spatial data across applications, efficiently indexing it becomes key. Minimum bounding boxes (MBBs) — i.e., axis-aligned rectangles that minimally enclose an object — used as approximations for complex geometric objects have become crucial for spatial indexes. MBBs succinctly summarize complex spatial objects and thus allow for an efficient filtering stage thanks to faster intersection tests. However, they introduce dead-space, i.e., space that is indexed but contains no spatial objects. Querying dead space gives no result but reads data from disk thus slowing down query execution unnecessarily. In this paper, we propose MaMBo (Meshed MBb), a grid-based data structure to index dead space in addition to an index of the spatial objects. We augment intersection operations of established indexes to consult our data structure while executing queries, thereby avoiding retrieval of unnecessary data from disk, i.e., data which only contains dead space. As our experiments show, we can significantly reduce I/O — the major overhead for disk-resident datasets — by over 50% when using MaMBo with an R-Tree.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124138236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ArrayQL for Linear Algebra within Umbra ArrayQL用于本影内的线性代数

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468838

Maximilian E. Schüle, T. Götz, A. Kemper, Thomas Neumann

引用次数: 6

Distributed Enumeration of Four Node Graphlets at Quadrillion-Scale 千万亿规模下四节点石墨烯的分布式枚举

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468805

Xiaozhou Liu, Yudi Santoso, Venkatesh Srinivasan, Alex Thomo

引用次数: 1

Online Landmark-Based Batch Processing of Shortest Path Queries 基于里程碑的在线批处理最短路径查询

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468844

Manuel Hotz, Theodoros Chondrogiannis, Leonard Wörteler, Michael Grossniklaus

引用次数: 0

DJEnsemble: a Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-temporal Models DJEnsemble:基于成本的时空模型集合的选择与分配

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468806

R. S. Pereira, Y. M. Souto, A. Silva, Rocio Zorilla, Brian Tsan, Florin Rusu, Eduardo S. Ogasawara, A. Ziviani, F. Porto

{"title":"DJEnsemble: a Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-temporal Models","authors":"R. S. Pereira, Y. M. Souto, A. Silva, Rocio Zorilla, Brian Tsan, Florin Rusu, Eduardo S. Ogasawara, A. Ziviani, F. Porto","doi":"10.1145/3468791.3468806","DOIUrl":"https://doi.org/10.1145/3468791.3468806","url":null,"abstract":"Consider a set of black-box models – each of them independently trained on a different dataset – answering the same predictive spatio-temporal query. Being built in isolation, each model traverses its own life-cycle until it is deployed to production, learning data patterns from different datasets and facing independent hyper-parameter tuning. In order to answer the query, the set of black-box predictors has to be ensembled and allocated to the spatio-temporal query region. However, computing an optimal ensemble is a complex task that involves selecting the appropriate models and defining an effective allocation strategy that maps the models to the query region. In this paper we present DJEnsemble, a cost-based strategy for the automatic selection and allocation of a disjoint ensemble of black-box predictors to answer predictive spatio-temporal queries. We conduct a set of extensive experiments that evaluate DJEnsemble and highlight its efficiency, selecting model ensembles that are almost as efficient as the optimal solution. When compared against the traditional ensemble approach, DJEnsemble achieves up to 4X improvement in execution time and almost 9X improvement in prediction accuracy.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124814726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

In-Database Machine Learning with SQL on GPUs 基于gpu的SQL数据库内机器学习

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468840

Maximilian E. Schüle, Harald Lang, M. Springer, A. Kemper, Thomas Neumann, Stephan Günnemann

{"title":"In-Database Machine Learning with SQL on GPUs","authors":"Maximilian E. Schüle, Harald Lang, M. Springer, A. Kemper, Thomas Neumann, Stephan Günnemann","doi":"10.1145/3468791.3468840","DOIUrl":"https://doi.org/10.1145/3468791.3468840","url":null,"abstract":"In machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely been used for operations such as matrix algebra and gradient descent. In this work, we demonstrate that SQL with recursive tables makes it possible to express a complete machine learning pipeline out of data preprocessing, model training and its validation. To facilitate the specification of loss functions, we extend the code-generating database system Umbra by an operator for automatic differentiation for use within recursive tables: With the loss function expressed in SQL as a lambda function, Umbra generates machine code for each partial derivative. We further use automatic differentiation for a dedicated gradient descent operator, which generates LLVM code to train a user-specified model on GPUs. We fine-tune GPU kernels at hardware level to allow a higher throughput and propose non-blocking synchronisation of multiple units. In our evaluation, automatic differentiation accelerated the runtime by the number of cached subexpressions compared to compiling each derivative separately. Our GPU kernels with independent models allowed maximal throughput even for small batch sizes, making machine learning pipelines within SQL more competitive.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126427023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Automatic Selection of Analytic Platforms with ASAP-DM 基于asp - dm的分析平台自动选择

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI: 10.1145/3468791.3468802

M. Fritz, Gang Shao, H. Schwarz

引用次数: 0