Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management最新文献_第2页

A study of partitioning and parallel UDF execution with the SAP HANA database 基于SAP HANA数据库的分区和并行UDF执行研究

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618274

Philippe Grosse, Norman May, Wolfgang Lehner

引用次数: 14

Mining statistically sound co-location patterns at multiple distances 在多个距离上挖掘统计上合理的共定位模式

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618261

Sajib Barua, J. Sander

{"title":"Mining statistically sound co-location patterns at multiple distances","authors":"Sajib Barua, J. Sander","doi":"10.1145/2618243.2618261","DOIUrl":"https://doi.org/10.1145/2618243.2618261","url":null,"abstract":"Existing co-location mining algorithms require a user provided distance threshold at which prevalent patterns are searched. Since spatial interactions, in reality, may happen at different distances, finding the right distance threshold to mine all true patterns is not easy and a single appropriate threshold may not even exist. A standard co-location mining algorithm also requires a prevalence measure threshold to find prevalent patterns. The prevalence measure values of the true co-location patterns occurring at different distances may vary and finding a prevalence measure threshold to mine all true patterns without reporting random patterns is not easy and sometimes not even possible. In this paper, we propose an algorithm to mine true co-location patterns at multiple distances. Our approach is based on a statistical test and does not require thresholds for the prevalence measure and the interaction distance. We evaluate the efficacy of our algorithm using synthetic and real data sets comparing it with the state-of-the-art co-location mining approach.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"7:1-7:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85548398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618251

H. Nguyen, Emmanuel Müller, Periklis Andritsos, Klemens Böhm

{"title":"Detecting correlated columns in relational databases with mixed data types","authors":"H. Nguyen, Emmanuel Müller, Periklis Andritsos, Klemens Böhm","doi":"10.1145/2618243.2618251","DOIUrl":"https://doi.org/10.1145/2618243.2618251","url":null,"abstract":"In a database, besides known dependencies among columns (e.g., foreign key and primary key constraints), there are many other correlations unknown to the database users. Extraction of such hidden correlations is known to be useful for various tasks in database optimization and data analytics. However, the task is challenging due to the lack of measures to quantify column correlations. Correlations may exist among columns of different data types and value domains, which makes techniques based on value matching inapplicable. Besides, a column may have multiple semantics, which does not allow disjoint partitioning of columns. Finally, from a computational perspective, one has to consider a huge search space that grows exponentially with the number of columns.\u0000 In this paper, we present a novel method for detecting column correlations (DeCoRel). It aims at discovering overlapping groups of correlated columns with mixed data types in relational databases. To handle the heterogeneity of data types, we propose a new correlation measure that combines the good features of Shannon entropy and cumulative entropy. To address the huge search space, we introduce an efficient algorithm for the column grouping. Compared to state of the art techniques, we show our method to be more general than one of the most recent approaches in the database literature. Experiments reveal that our method achieves both higher quality and better scalability than existing techniques.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"12 1","pages":"30:1-30:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74592234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

New approaches to storing and manipulating multi-dimensional sparse arrays 存储和操作多维稀疏数组的新方法

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618281

E. Otoo, Hairong Wang, Gideon Nimako

{"title":"New approaches to storing and manipulating multi-dimensional sparse arrays","authors":"E. Otoo, Hairong Wang, Gideon Nimako","doi":"10.1145/2618243.2618281","DOIUrl":"https://doi.org/10.1145/2618243.2618281","url":null,"abstract":"In this paper, we introduce some storage schemes for multi-dimensional sparse arrays (MDSAs) that handle the sparsity of the array with two primary goals; reducing the storage overhead and maintaining efficient data element access. Four schemes are proposed. These are: i.) The PATRICIA trie compressed storage method (PTCS) which uses PATRICIA trie to store the valid non-zero array elements; ii.)The extended compressed row storage (xCRS) which extends CRS method for sparse matrix storage to sparse arrays of higher dimensions and achieves the best data element access efficiency of all the methods; iii.) The bit encoded xCRS (BxCRS) which optimizes the storage utilization of xCRS by applying data compression methods with run length encoding, while maintaining its data access efficiency; and iv.) a hybrid approach that provides a desired balance between the storage utilization and data manipulation efficiency by combining xCRS and the Bit Encoded Sparse Storage (BESS). These storage schemes were evaluated and compared on three basic array operations; constructing the storage scheme, accessing a random element and retrieving a sub-array, using a set of synthetic sparse multi-dimensional arrays.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"48 1","pages":"41:1-41:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87316217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Simulation workflow design tailor-made for scientists 为科学家量身定制的仿真工作流程设计

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618291

P. Reimann, H. Schwarz

引用次数: 4

Point cloud databases 点云数据库

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618275

L. Dobos, I. Csabai, J. Szalai-Gindl, T. Budavári, A. Szalay

{"title":"Point cloud databases","authors":"L. Dobos, I. Csabai, J. Szalai-Gindl, T. Budavári, A. Szalay","doi":"10.1145/2618243.2618275","DOIUrl":"https://doi.org/10.1145/2618243.2618275","url":null,"abstract":"We introduce the concept of the point cloud database, a new kind of database system aimed primarily towards scientific applications. Many scientific observations, experiments, feature extraction algorithms and large-scale simulations produce enormous amounts of data that are better represented as sparse (but often highly-clustered) points in a k-dimensional (k ≲ 10) metric space than on a multi-dimensional grid. Dimensionality reduction techniques, such as principal components, are also widely-used to project high dimensional data into similarly low dimensional spaces. Analysis techniques developed to work on multi-dimensional data points are usually implemented as in-memory algorithms and need to be modified to work in distributed cluster environments and on large amounts of disk-resident data. We conclude that the relational model, with certain additions, is appropriate for point clouds, but point cloud databases must also provide unique set of spatial search and proximity join operators, indexing schemes, and query language constructs that make them a distinct class of database systems.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"83 1","pages":"33:1-33:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80650199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

DistillFlow: removing redundancy in scientific workflows 蒸馏流:去除科学工作流程中的冗余

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618287

Jiuqiang Chen, Sarah Cohen Boulakia, C. Froidevaux, C. Goble, P. Missier, Alan R. Williams

引用次数: 3

Subspace anytime stream clustering 子空间随时流聚类

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618286

Marwan Hassani, P. Kranen, Rajveer Saini, T. Seidl

引用次数: 23

Communication-efficient preference top-k monitoring queries via subscriptions 通信效率首选项top-k通过订阅监视查询

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618284

Kamalas Udomlamlert, T. Hara, S. Nishio

{"title":"Communication-efficient preference top-k monitoring queries via subscriptions","authors":"Kamalas Udomlamlert, T. Hara, S. Nishio","doi":"10.1145/2618243.2618284","DOIUrl":"https://doi.org/10.1145/2618243.2618284","url":null,"abstract":"With the increase of data generation in distributed fashions such as peer-to-peer systems and sensor networks, top-k query processing which returns only a small set of data that satisfies many users' preferences, becomes a substantial issue. When data are periodically updated in each epoch e.g., weather information, without any techniques, a naive solution is to aggregate all data and their updates to ensure the correctness of final answers, however, it is too costly in terms of data transfer especially for data aggregator nodes. In this paper, we propose a top-k monitoring query processing method in 2-tier distributed systems based on a publish-subscribe scheme. A set of top-k subscriptions specifying summary scope of users' interests is informed to aggregators to limit the number of transferred data records for each epoch. In addition, instead of issuing subscriptions of all queries, our method identifies a small set of minimal subscriptions resulting in lower communication overhead. Our experiments show that our technique is efficient and outperforms other comparative reactive methods.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"29 1","pages":"44:1-44:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81215567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Local context selection for outlier ranking in graphs with multiple numeric node attributes 在具有多个数字节点属性的图中进行离群值排序的局部上下文选择

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618266

Patricia Iglesias Sánchez, Emmanuel Müller, Oretta Irmler, Klemens Böhm

{"title":"Local context selection for outlier ranking in graphs with multiple numeric node attributes","authors":"Patricia Iglesias Sánchez, Emmanuel Müller, Oretta Irmler, Klemens Böhm","doi":"10.1145/2618243.2618266","DOIUrl":"https://doi.org/10.1145/2618243.2618266","url":null,"abstract":"Outlier ranking aims at the distinction between exceptional outliers and regular objects by measuring deviation of individual objects. In graphs with multiple numeric attributes, not all the attributes are relevant or show dependencies with the graph structure. Considering both graph structure and all given attributes, one cannot measure a clear deviation of objects. This is because the existence of irrelevant attributes clearly hinders the detection of outliers. Thus, one has to select local outlier contexts including only those attributes showing a high contrast between regular and deviating objects. It is an open challenge to detect meaningful local contexts for each node in attributed graphs.\u0000 In this work, we propose a novel local outlier ranking model for graphs with multiple numeric node attributes. For each object, our technique determines its subgraph and its statistically relevant subset of attributes locally. This context selection enables a high contrast between an outlier and the regular objects. Out of this context, we compute the outlierness score by incorporating both the attribute value deviation and the graph structure. In our evaluation on real and synthetic data, we show that our approach is able to detect contextual outliers that are missed by other outlier models.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"4 1","pages":"16:1-16:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84587883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42