Proceedings of the 28th International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
Demonstrating KDBMS: A Knowledge-based Database Management System 演示KDBMS:一个基于知识的数据库管理系统
Mohamed E. Khalefa, Sameh S. El-Atawy
{"title":"Demonstrating KDBMS: A Knowledge-based Database Management System","authors":"Mohamed E. Khalefa, Sameh S. El-Atawy","doi":"10.1145/2949689.2949714","DOIUrl":"https://doi.org/10.1145/2949689.2949714","url":null,"abstract":"We demonstrate a KDBMS, a prototype system which seamlessly integrates Knowledge base and DBMS. While state-of-the-art approaches, i.e., Ontology-based data access, denoted as OBDA, use ontologies to only query data stored in relational databases using SPARQL. In this demo, we present a high level description of the proposed system, introduce a new knowledge-based query language, denoted as KQL, and highlight some query optimization opportunities by employing knowledge across database layers in query optimization, and query processing, while ease the administrating for a complex database schema.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123788693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SciServer Compute: Bringing Analysis Close to the Data 剪刀服务器计算:让分析更接近数据
Dmitry Medvedev, G. Lemson, M. Rippin
{"title":"SciServer Compute: Bringing Analysis Close to the Data","authors":"Dmitry Medvedev, G. Lemson, M. Rippin","doi":"10.1145/2949689.2949700","DOIUrl":"https://doi.org/10.1145/2949689.2949700","url":null,"abstract":"SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131077352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Regular Path Queries on Massive Graphs 海量图的常规路径查询
Maurizio Nolé, C. Sartiani
{"title":"Regular Path Queries on Massive Graphs","authors":"Maurizio Nolé, C. Sartiani","doi":"10.1145/2949689.2949711","DOIUrl":"https://doi.org/10.1145/2949689.2949711","url":null,"abstract":"Regular Path Queries (RPQs) represent a powerful tool for querying graph databases and are of particular interest, because they form the building blocks of other query languages, and because they can be used in many theoretical or practical contexts for different purposes. In this paper we present a novel system for processing regular path queries on massive data graphs. As confirmed by an extensive experimental evaluation, our system scales linearly with the number of vertices and/or edges, and it can efficiently query graphs up to a billion vertices and 100 billion edges.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Framework for real-time clustering over sliding windows 基于滑动窗口的实时集群框架
Sobhan Badiozamany, Kjell Orsborn, T. Risch
{"title":"Framework for real-time clustering over sliding windows","authors":"Sobhan Badiozamany, Kjell Orsborn, T. Risch","doi":"10.1145/2949689.2949696","DOIUrl":"https://doi.org/10.1145/2949689.2949696","url":null,"abstract":"Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Efficient Feedback Collection for Pay-as-you-go Source Selection 有效的反馈收集,即付即用源选择
Julio César Cortés Ríos, N. Paton, A. Fernandes, Khalid Belhajjame
{"title":"Efficient Feedback Collection for Pay-as-you-go Source Selection","authors":"Julio César Cortés Ríos, N. Paton, A. Fernandes, Khalid Belhajjame","doi":"10.1145/2949689.2949690","DOIUrl":"https://doi.org/10.1145/2949689.2949690","url":null,"abstract":"Technical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115350634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Novel Data Reduction Based on Statistical Similarity 基于统计相似度的新型数据约简
Dongeun Lee, A. Sim, Jaesik Choi, Kesheng Wu
{"title":"Novel Data Reduction Based on Statistical Similarity","authors":"Dongeun Lee, A. Sim, Jaesik Choi, Kesheng Wu","doi":"10.1145/2949689.2949708","DOIUrl":"https://doi.org/10.1145/2949689.2949708","url":null,"abstract":"Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. We propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storage requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. In these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124048892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Array Database Scalability: Intercontinental Queries on Petabyte Datasets 阵列数据库可扩展性:对pb数据集的洲际查询
A. Dumitru, Vlad Merticariu, P. Baumann
{"title":"Array Database Scalability: Intercontinental Queries on Petabyte Datasets","authors":"A. Dumitru, Vlad Merticariu, P. Baumann","doi":"10.1145/2949689.2949717","DOIUrl":"https://doi.org/10.1145/2949689.2949717","url":null,"abstract":"With the deluge of scientific big data affecting a large variety of research institutions, support for large multidimensional arrays has gained traction in the database community in the past decade. Array databases aim to cover the gap left by traditional relational database systems in the domains of large scientific data by enabling researchers to efficiently store and process their data through rich declarative query languages. Such large amounts of data need effective systems that are able to distribute the processing at both local level, through exploitation of heterogeneous hardware as well as at network level, enabling both intra-cloud and intra-federation distribution of data and processing. In this demonstration we aim to showcase the capabilities of rasdaman by allowing users to execute queries that combine petabyte datasets stored at two institutions on different continents.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Functional Dependencies Unleashed for Scalable Data Exchange 释放可扩展数据交换的功能依赖
A. Bonifati, Ioana Ileana, Michele Linardi
{"title":"Functional Dependencies Unleashed for Scalable Data Exchange","authors":"A. Bonifati, Ioana Ileana, Michele Linardi","doi":"10.1145/2949689.2949698","DOIUrl":"https://doi.org/10.1145/2949689.2949698","url":null,"abstract":"We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aimed at gauging its scalability and efficiency. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Proceedings of the 28th International Conference on Scientific and Statistical Database Management 第28届科学与统计数据库管理国际会议论文集
{"title":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","authors":"","doi":"10.1145/2949689","DOIUrl":"https://doi.org/10.1145/2949689","url":null,"abstract":"","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121479944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信