Proceedings of the 28th International Conference on Scientific and Statistical Database Management最新文献_第3页

Demonstrating KDBMS: A Knowledge-based Database Management System 演示KDBMS:一个基于知识的数据库管理系统

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949714

Mohamed E. Khalefa, Sameh S. El-Atawy

引用次数: 0

SciServer Compute: Bringing Analysis Close to the Data 剪刀服务器计算:让分析更接近数据

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949700

Dmitry Medvedev, G. Lemson, M. Rippin

{"title":"SciServer Compute: Bringing Analysis Close to the Data","authors":"Dmitry Medvedev, G. Lemson, M. Rippin","doi":"10.1145/2949689.2949700","DOIUrl":"https://doi.org/10.1145/2949689.2949700","url":null,"abstract":"SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131077352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Regular Path Queries on Massive Graphs 海量图的常规路径查询

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949711

Maurizio Nolé, C. Sartiani

引用次数: 15

Framework for real-time clustering over sliding windows 基于滑动窗口的实时集群框架

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949696

Sobhan Badiozamany, Kjell Orsborn, T. Risch

{"title":"Framework for real-time clustering over sliding windows","authors":"Sobhan Badiozamany, Kjell Orsborn, T. Risch","doi":"10.1145/2949689.2949696","DOIUrl":"https://doi.org/10.1145/2949689.2949696","url":null,"abstract":"Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Efficient Feedback Collection for Pay-as-you-go Source Selection 有效的反馈收集，即付即用源选择

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949690

Julio César Cortés Ríos, N. Paton, A. Fernandes, Khalid Belhajjame

{"title":"Efficient Feedback Collection for Pay-as-you-go Source Selection","authors":"Julio César Cortés Ríos, N. Paton, A. Fernandes, Khalid Belhajjame","doi":"10.1145/2949689.2949690","DOIUrl":"https://doi.org/10.1145/2949689.2949690","url":null,"abstract":"Technical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115350634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Novel Data Reduction Based on Statistical Similarity 基于统计相似度的新型数据约简

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949708

Dongeun Lee, A. Sim, Jaesik Choi, Kesheng Wu

{"title":"Novel Data Reduction Based on Statistical Similarity","authors":"Dongeun Lee, A. Sim, Jaesik Choi, Kesheng Wu","doi":"10.1145/2949689.2949708","DOIUrl":"https://doi.org/10.1145/2949689.2949708","url":null,"abstract":"Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. We propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storage requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. In these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124048892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Array Database Scalability: Intercontinental Queries on Petabyte Datasets 阵列数据库可扩展性:对pb数据集的洲际查询

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI: 10.1145/2949689.2949717

A. Dumitru, Vlad Merticariu, P. Baumann

引用次数: 11

Functional Dependencies Unleashed for Scalable Data Exchange 释放可扩展数据交换的功能依赖

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-02-01 DOI: 10.1145/2949689.2949698

A. Bonifati, Ioana Ileana, Michele Linardi

{"title":"Functional Dependencies Unleashed for Scalable Data Exchange","authors":"A. Bonifati, Ioana Ileana, Michele Linardi","doi":"10.1145/2949689.2949698","DOIUrl":"https://doi.org/10.1145/2949689.2949698","url":null,"abstract":"We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Proceedings of the 28th International Conference on Scientific and Statistical Database Management 第28届科学与统计数据库管理国际会议论文集

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 1900-01-01 DOI: 10.1145/2949689

引用次数: 0