Proceedings of the 27th International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
Ontology-assisted keyword search for NeuroML models NeuroML模型的本体辅助关键字搜索
J. Birgiolas, S. Dietrich, S. Crook, Ashwin Rajadesingan, Chao Zhang, Shriharsha Velugoti Penchala, Veerasekhar Addepalli
{"title":"Ontology-assisted keyword search for NeuroML models","authors":"J. Birgiolas, S. Dietrich, S. Crook, Ashwin Rajadesingan, Chao Zhang, Shriharsha Velugoti Penchala, Veerasekhar Addepalli","doi":"10.1145/2791347.2791360","DOIUrl":"https://doi.org/10.1145/2791347.2791360","url":null,"abstract":"NeuroML is an extensible markup language for describing complex mathematical models of neurons and neuronal networks. NeuroML is unique in its modular, multi-scale structure -- not only can entire NeuroML models be exchanged, but subcomponents of these models that correspond to neuroscience objects, like channels or synapses, also can be shared and reimplemented in a different model. This paper presents the design, implementation, and evaluation of an ontology-assisted search for NeuroML models. Specifically, the paper describes the design of the system, including the database that stores the modular NeuroML models and the architecture of the Web-based search (neuroml-db.org). The implementation takes advantage of the nested structure of NeuroML models and the NeuroLex ontology for neuroscience to provide additional semantic information to enhance the search. In addition to NeuroLex terms that may exist in model metadata, this initial implementation takes advantage of several semantic relationships provided by the NeuroLex ontology: Is_part_of, Located_in, and Neurotransmitter. An evaluation of the system illustrates its effectiveness both for functionality and performance, covering various types of searches broken down by keyword searches over the database and ontology searches using the semantic relationships.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133736223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Top-k entity augmentation using consistent set covering 使用一致集覆盖的Top-k实体增广
Julian Eberius, Maik Thiele, Katrin Braunschweig, Wolfgang Lehner
{"title":"Top-k entity augmentation using consistent set covering","authors":"Julian Eberius, Maik Thiele, Katrin Braunschweig, Wolfgang Lehner","doi":"10.1145/2791347.2791353","DOIUrl":"https://doi.org/10.1145/2791347.2791353","url":null,"abstract":"Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129994213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A novel approach for approximate aggregations over arrays 阵列近似聚合的一种新方法
Yi Wang, Yunde Su, G. Agrawal
{"title":"A novel approach for approximate aggregations over arrays","authors":"Yi Wang, Yunde Su, G. Agrawal","doi":"10.1145/2791347.2791349","DOIUrl":"https://doi.org/10.1145/2791347.2791349","url":null,"abstract":"Approximate aggregation has been a popular approach for interactive data analysis and decision making, especially on large-scale datasets. While there is clearly a need to apply this approach for scientific datasets comprising massive arrays, existing algorithms have largely been developed for relational data, and cannot handle both dimension-based and value-based predicates efficiently while maintaining accuracy. In this paper, we present a novel approach for approximate aggregations over array data, using bitmap indices or bitvectors as the summary structure, as they preserve both spatial and value distribution of the data. We develop approximate aggregation algorithms using only the bitvectors and certain additional pre-aggregation statistics (equivalent to a 1-dimensional histogram) that we require. Another key development is choosing a binning strategy that can improve aggregation accuracy -- we introduce a v-optimized binning strategy and its weighted extension, and present a bitmap construction algorithm with such binning. We compare our method with other existing methods including sampling and multi-dimensional histograms, as well as the use of other binning strategies with bitmaps. We demonstrate both high accuracy and efficiency of our approach. Specifically, we show that in most cases, our method is more accurate than other methods by at least one order of magnitude. Despite achieving much higher accuracy, our method can require significantly less storage than multi-dimensional histograms.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127084278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
Relaxation of subgraph queries delivering empty results 放松提供空结果的子图查询
E. Vasilyeva, Maik Thiele, Adrian Mocan, Wolfgang Lehner
{"title":"Relaxation of subgraph queries delivering empty results","authors":"E. Vasilyeva, Maik Thiele, Adrian Mocan, Wolfgang Lehner","doi":"10.1145/2791347.2791382","DOIUrl":"https://doi.org/10.1145/2791347.2791382","url":null,"abstract":"Graph databases with the property graph model are used in multiple domains including social networks, biology, and data integration. They provide schema-flexible storage for data of a different degree of a structure and support complex, expressive queries such as subgraph isomorphism queries. The exibility and expressiveness of graph databases make it difficult for the users to express queries correctly and can lead to unexpected query results, e.g. empty results. Therefore, we propose a relaxation approach for subgraph isomorphism queries that is able to automatically rewrite a graph query, such that the rewritten query is similar to the original query and returns a non-empty result set. In detail, we present relaxation operations applicable to a query, cardinality estimation heuristics, and strategies for prioritizing graph query elements to be relaxed. To determine the similarity between the original query and its relaxed variants, we propose a novel cardinality-based graph edit distance. The feasibility of our approach is shown by using real-world queries from the DBpedia query log.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
OpenAlea: scientific workflows combining data analysis and simulation OpenAlea:结合数据分析和仿真的科学工作流
C. Pradal, C. Fournier, P. Valduriez, Sarah Cohen Boulakia
{"title":"OpenAlea: scientific workflows combining data analysis and simulation","authors":"C. Pradal, C. Fournier, P. Valduriez, Sarah Cohen Boulakia","doi":"10.1145/2791347.2791365","DOIUrl":"https://doi.org/10.1145/2791347.2791365","url":null,"abstract":"Analyzing biological data (e.g., annotating genomes, assembling NGS data...) may involve very complex and interlinked steps where several tools are combined together. Scientific workflow systems have reached a level of maturity that makes them able to support the design and execution of such in-silico experiments, and thus making them increasingly popular in the bioinformatics community. However, in some emerging application domains such as system biology, developmental biology or ecology, the need for data analysis is combined with the need to model complex multi-scale biological systems, possibly involving multiple simulation steps. This requires the scientific workflow to deal with retro-action to understand and predict the relationships between structure and function of these complex systems. OpenAlea (openalea.gforge.inria.fr) is the only scientific workflow system able to uniformly address the problem, which made it successful in the scientific community. One of its main originality is to introduce higher-order dataflows as a means to uniformly combine classical data analysis with modeling and simulation. In this demonstration paper, we provide for the first time the description of the OpenAlea system involving an original combination of features. We illustrate the demonstration on a high-throughput workflow in phenotyping, phenomics, and environmental control designed to study the interplay between plant architecture and climatic change.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128470905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Similarity search in fuzzy object databases 模糊对象数据库中的相似度搜索
Diana Uskat, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, T. Bernecker, M. Renz
{"title":"Similarity search in fuzzy object databases","authors":"Diana Uskat, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, T. Bernecker, M. Renz","doi":"10.1145/2791347.2791386","DOIUrl":"https://doi.org/10.1145/2791347.2791386","url":null,"abstract":"Fuzzy object databases are becoming more and more important in the context of image analysis. Examples include satellite images where blurred trees, houses or lakes can still be organized and searched in a meaningful manner and biomedical images which can be utilized to find similar disease patterns and monitor disease progress. One problem of the underlying data is that it contains blurred image content, i.e., fuzzy data. Therefore, an image-based similarity search, which can process huge amounts of fuzzy data in an efficient and effective way, is desirable. The aim of this work is to develop efficient and effective methods for similarity search in fuzzy object databases. First, a suitable similarity measure based on a shape similarity is proposed. Based on this, two novel k-nearest neighbor algorithms for efficient similarity search are presented. The first approach gains efficiency at the cost of incurring only approximate results, while the second approach uses a filter-refinement approach to prune computation. Our experimental evaluation shows the efficiency of the proposed algorithms.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130923732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Privacy-preserving big data publishing 保护隐私的大数据发布
Hessam Zakerzadeh, C. Aggarwal, K. Barker
{"title":"Privacy-preserving big data publishing","authors":"Hessam Zakerzadeh, C. Aggarwal, K. Barker","doi":"10.1145/2791347.2791380","DOIUrl":"https://doi.org/10.1145/2791347.2791380","url":null,"abstract":"The problem of privacy-preserving data mining has been studied extensively in recent years because of its importance as a key enabler in the sharing of massive data sets. Most of the work in privacy has focussed on issues involving the quality of privacy preservation and utility, though there has been little focus on the issue of scalability in privacy preservation. The reason for this is that anonymization has generally been seen as a batch and one-time process in the context of data sharing. However, in recent years, the sizes of data sets have grown tremendously to a point where the effective application of the current algorithms is becoming increasingly difficult. Furthermore, the transient nature of recent data sets has resulted in an increased need for the repeated application of such methods on the newer data sets which have been collected. Repeated application demands even greater computational efficiency in order to be practical. For example, an algorithm with quadratic complexity is unlikely to be implementable in reasonable time over terabyte scale data sets. A bigger issue is that larger data sets are likely to be addressed by distributed frameworks such as MapReduce. In such frameworks, one has to address the additional issue of minimizing data transfer across different nodes, which is the bottleneck. In this paper, we discuss the first approach towards privacy-preserving data mining of very massive data sets using MapReduce. We study two most widely-used privacy models k-anonymity and l-diversity for anonymization, and present experimental results illustrating the effectiveness of the approach.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116133232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
FiND: a real-time filtering by novelty and diversity for publish/subscribe systems FiND:根据新颖性和多样性对发布/订阅系统进行实时过滤
Zeinab Hmedeh, C. Mouza, Nicolas Travers
{"title":"FiND: a real-time filtering by novelty and diversity for publish/subscribe systems","authors":"Zeinab Hmedeh, C. Mouza, Nicolas Travers","doi":"10.1145/2791347.2791356","DOIUrl":"https://doi.org/10.1145/2791347.2791356","url":null,"abstract":"Content syndication has become a popular way for timely delivery of frequently updated information on the Web. It essentially enhances traditional pull-oriented searching and browsing of web pages with push-oriented protocols. However many Web syndication applications imply a tight coupling between feed producers and consumers and do not help users to find, in all information they received, items with interesting and new content. We present the FiND Pub/Sub system which integrates an in-memory filtering process based on keyword subscriptions. Unlike existing proposals, FiND is designed for real-time notifications on item streams. This demonstration illustrates the main features of the FiND system namely (i) a scalable real-time notification process when the most important terms of the subscription are matched, (ii) a tunable filtering by novelty and diversity to reduce user flooding.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"55 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133651675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient similarity search in scientific databases with feature signatures 基于特征签名的科学数据库高效相似度搜索
M. S. Uysal, C. Beecks, Jochen Schmücking, T. Seidl
{"title":"Efficient similarity search in scientific databases with feature signatures","authors":"M. S. Uysal, C. Beecks, Jochen Schmücking, T. Seidl","doi":"10.1145/2791347.2791384","DOIUrl":"https://doi.org/10.1145/2791347.2791384","url":null,"abstract":"The recent rapid growth of scientific data necessitates efficient similarity search techniques for which convenient object representation models are of vital importance. Feature signatures denoting highly flexible object feature representations have increasingly gained attention for which corresponding efficiency improvement techniques are developed. In this paper, we focus on efficient query processing with the well-known Earth Mover's Distance (EMD) on databases of feature signatures, and propose efficient approximation techniques successfully applicable to high-dimensional feature signatures via dimensionality reduction, guaranteeing both completeness and no false-dismissal within a filter-and-refine architecture. Rigorous experiments on real world data indicate a considerable reduction in the number of EMD computations and high efficiency of the proposed techniques which significantly reduce the query processing time.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134089593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Indexing bi-temporal windows 索引双时态窗口
Chang Ge, Martin Kaufmann, Lukasz Golab, Peter M. Fischer, Anil K. Goel
{"title":"Indexing bi-temporal windows","authors":"Chang Ge, Martin Kaufmann, Lukasz Golab, Peter M. Fischer, Anil K. Goel","doi":"10.1145/2791347.2791373","DOIUrl":"https://doi.org/10.1145/2791347.2791373","url":null,"abstract":"Bi-temporal databases support system (transaction) and application time, enabling users to query the history as recorded today and as it was known in the past. In this paper, we study windows over both system and application time, i.e., bi-temporal windows. We propose a two-dimensional index that supports one-time and continuous queries over fixed and sliding bi-temporal windows, covering static and streaming data. We demonstrate the advantages of the proposed index compared to the state-of-the-art in terms of query performance, index update overhead and space footprint.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134525778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信