Proceedings of the 27th International Conference on Scientific and Statistical Database Management最新文献_第3页

Ontology-assisted keyword search for NeuroML models NeuroML模型的本体辅助关键字搜索

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791360

J. Birgiolas, S. Dietrich, S. Crook, Ashwin Rajadesingan, Chao Zhang, Shriharsha Velugoti Penchala, Veerasekhar Addepalli

{"title":"Ontology-assisted keyword search for NeuroML models","authors":"J. Birgiolas, S. Dietrich, S. Crook, Ashwin Rajadesingan, Chao Zhang, Shriharsha Velugoti Penchala, Veerasekhar Addepalli","doi":"10.1145/2791347.2791360","DOIUrl":"https://doi.org/10.1145/2791347.2791360","url":null,"abstract":"NeuroML is an extensible markup language for describing complex mathematical models of neurons and neuronal networks. NeuroML is unique in its modular, multi-scale structure -- not only can entire NeuroML models be exchanged, but subcomponents of these models that correspond to neuroscience objects, like channels or synapses, also can be shared and reimplemented in a different model. This paper presents the design, implementation, and evaluation of an ontology-assisted search for NeuroML models. Specifically, the paper describes the design of the system, including the database that stores the modular NeuroML models and the architecture of the Web-based search (neuroml-db.org). The implementation takes advantage of the nested structure of NeuroML models and the NeuroLex ontology for neuroscience to provide additional semantic information to enhance the search. In addition to NeuroLex terms that may exist in model metadata, this initial implementation takes advantage of several semantic relationships provided by the NeuroLex ontology: Is_part_of, Located_in, and Neurotransmitter. An evaluation of the system illustrates its effectiveness both for functionality and performance, covering various types of searches broken down by keyword searches over the database and ontology searches using the semantic relationships.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133736223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Top-k entity augmentation using consistent set covering 使用一致集覆盖的Top-k实体增广

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791353

Julian Eberius, Maik Thiele, Katrin Braunschweig, Wolfgang Lehner

{"title":"Top-k entity augmentation using consistent set covering","authors":"Julian Eberius, Maik Thiele, Katrin Braunschweig, Wolfgang Lehner","doi":"10.1145/2791347.2791353","DOIUrl":"https://doi.org/10.1145/2791347.2791353","url":null,"abstract":"Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129994213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Relaxation of subgraph queries delivering empty results 放松提供空结果的子图查询

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791382

E. Vasilyeva, Maik Thiele, Adrian Mocan, Wolfgang Lehner

{"title":"Relaxation of subgraph queries delivering empty results","authors":"E. Vasilyeva, Maik Thiele, Adrian Mocan, Wolfgang Lehner","doi":"10.1145/2791347.2791382","DOIUrl":"https://doi.org/10.1145/2791347.2791382","url":null,"abstract":"Graph databases with the property graph model are used in multiple domains including social networks, biology, and data integration. They provide schema-flexible storage for data of a different degree of a structure and support complex, expressive queries such as subgraph isomorphism queries. The exibility and expressiveness of graph databases make it difficult for the users to express queries correctly and can lead to unexpected query results, e.g. empty results. Therefore, we propose a relaxation approach for subgraph isomorphism queries that is able to automatically rewrite a graph query, such that the rewritten query is similar to the original query and returns a non-empty result set. In detail, we present relaxation operations applicable to a query, cardinality estimation heuristics, and strategies for prioritizing graph query elements to be relaxed. To determine the similarity between the original query and its relaxed variants, we propose a novel cardinality-based graph edit distance. The feasibility of our approach is shown by using real-world queries from the DBpedia query log.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Distributed top-k query processing on multi-dimensional data with keywords 基于关键字的多维数据分布式top-k查询处理

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791355

Daichi Amagata, T. Hara, S. Nishio

{"title":"Distributed top-k query processing on multi-dimensional data with keywords","authors":"Daichi Amagata, T. Hara, S. Nishio","doi":"10.1145/2791347.2791355","DOIUrl":"https://doi.org/10.1145/2791347.2791355","url":null,"abstract":"As we are in the big data era, techniques for retrieving only user-desirable data objects from massive and diverse datasets is being required. Ranking queries, e.g., top-k queries, which rank data objects based on a user-specified scoring function, enable to find such interesting data for users, and have received significant attention due to its wide range of applications. While many techniques for both centralized and distributed top-k query processing have been developed, they do not consider query keywords, i.e., simply retrieving k data with the best score. Utilizing keywords, on the other hand, is a common approach in data (and information) retrieval. Despite of this fact, there is no study on retrieving top-k data containing all query keywords. We define, in this paper, a new query which enriches the conventional top-k queries, and propose some algorithms to solve the novel problem of how to efficiently retrieve k data objects with the best score and all query from distributed databases. Extensive experiments on both real and synthetic data have demonstrated the efficiency and scalability of our algorithms in terms of communication cost and running time.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131382007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Similarity search in fuzzy object databases 模糊对象数据库中的相似度搜索

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791386

Diana Uskat, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, T. Bernecker, M. Renz

{"title":"Similarity search in fuzzy object databases","authors":"Diana Uskat, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, T. Bernecker, M. Renz","doi":"10.1145/2791347.2791386","DOIUrl":"https://doi.org/10.1145/2791347.2791386","url":null,"abstract":"Fuzzy object databases are becoming more and more important in the context of image analysis. Examples include satellite images where blurred trees, houses or lakes can still be organized and searched in a meaningful manner and biomedical images which can be utilized to find similar disease patterns and monitor disease progress. One problem of the underlying data is that it contains blurred image content, i.e., fuzzy data. Therefore, an image-based similarity search, which can process huge amounts of fuzzy data in an efficient and effective way, is desirable. The aim of this work is to develop efficient and effective methods for similarity search in fuzzy object databases. First, a suitable similarity measure based on a shape similarity is proposed. Based on this, two novel k-nearest neighbor algorithms for efficient similarity search are presented. The first approach gains efficiency at the cost of incurring only approximate results, while the second approach uses a filter-refinement approach to prune computation. Our experimental evaluation shows the efficiency of the proposed algorithms.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130923732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Privacy-preserving big data publishing 保护隐私的大数据发布

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791380

Hessam Zakerzadeh, C. Aggarwal, K. Barker

{"title":"Privacy-preserving big data publishing","authors":"Hessam Zakerzadeh, C. Aggarwal, K. Barker","doi":"10.1145/2791347.2791380","DOIUrl":"https://doi.org/10.1145/2791347.2791380","url":null,"abstract":"The problem of privacy-preserving data mining has been studied extensively in recent years because of its importance as a key enabler in the sharing of massive data sets. Most of the work in privacy has focussed on issues involving the quality of privacy preservation and utility, though there has been little focus on the issue of scalability in privacy preservation. The reason for this is that anonymization has generally been seen as a batch and one-time process in the context of data sharing. However, in recent years, the sizes of data sets have grown tremendously to a point where the effective application of the current algorithms is becoming increasingly difficult. Furthermore, the transient nature of recent data sets has resulted in an increased need for the repeated application of such methods on the newer data sets which have been collected. Repeated application demands even greater computational efficiency in order to be practical. For example, an algorithm with quadratic complexity is unlikely to be implementable in reasonable time over terabyte scale data sets. A bigger issue is that larger data sets are likely to be addressed by distributed frameworks such as MapReduce. In such frameworks, one has to address the additional issue of minimizing data transfer across different nodes, which is the bottleneck. In this paper, we discuss the first approach towards privacy-preserving data mining of very massive data sets using MapReduce. We study two most widely-used privacy models k-anonymity and l-diversity for anonymization, and present experimental results illustrating the effectiveness of the approach.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116133232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

A novel approach for approximate aggregations over arrays 阵列近似聚合的一种新方法

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791349

Yi Wang, Yunde Su, G. Agrawal

{"title":"A novel approach for approximate aggregations over arrays","authors":"Yi Wang, Yunde Su, G. Agrawal","doi":"10.1145/2791347.2791349","DOIUrl":"https://doi.org/10.1145/2791347.2791349","url":null,"abstract":"Approximate aggregation has been a popular approach for interactive data analysis and decision making, especially on large-scale datasets. While there is clearly a need to apply this approach for scientific datasets comprising massive arrays, existing algorithms have largely been developed for relational data, and cannot handle both dimension-based and value-based predicates efficiently while maintaining accuracy. In this paper, we present a novel approach for approximate aggregations over array data, using bitmap indices or bitvectors as the summary structure, as they preserve both spatial and value distribution of the data. We develop approximate aggregation algorithms using only the bitvectors and certain additional pre-aggregation statistics (equivalent to a 1-dimensional histogram) that we require. Another key development is choosing a binning strategy that can improve aggregation accuracy -- we introduce a v-optimized binning strategy and its weighted extension, and present a bitmap construction algorithm with such binning. We compare our method with other existing methods including sampling and multi-dimensional histograms, as well as the use of other binning strategies with bitmaps. We demonstrate both high accuracy and efficiency of our approach. Specifically, we show that in most cases, our method is more accurate than other methods by at least one order of magnitude. Despite achieving much higher accuracy, our method can require significantly less storage than multi-dimensional histograms.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127084278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

Estimating mutual information on data streams 估计数据流上的互信息

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791348

F. Keller, Emmanuel Müller, Klemens Böhm

引用次数: 29

OpenAlea: scientific workflows combining data analysis and simulation OpenAlea:结合数据分析和仿真的科学工作流

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791365

C. Pradal, C. Fournier, P. Valduriez, Sarah Cohen Boulakia

{"title":"OpenAlea: scientific workflows combining data analysis and simulation","authors":"C. Pradal, C. Fournier, P. Valduriez, Sarah Cohen Boulakia","doi":"10.1145/2791347.2791365","DOIUrl":"https://doi.org/10.1145/2791347.2791365","url":null,"abstract":"Analyzing biological data (e.g., annotating genomes, assembling NGS data...) may involve very complex and interlinked steps where several tools are combined together. Scientific workflow systems have reached a level of maturity that makes them able to support the design and execution of such in-silico experiments, and thus making them increasingly popular in the bioinformatics community. However, in some emerging application domains such as system biology, developmental biology or ecology, the need for data analysis is combined with the need to model complex multi-scale biological systems, possibly involving multiple simulation steps. This requires the scientific workflow to deal with retro-action to understand and predict the relationships between structure and function of these complex systems. OpenAlea (openalea.gforge.inria.fr) is the only scientific workflow system able to uniformly address the problem, which made it successful in the scientific community. One of its main originality is to introduce higher-order dataflows as a means to uniformly combine classical data analysis with modeling and simulation. In this demonstration paper, we provide for the first time the description of the OpenAlea system involving an original combination of features. We illustrate the demonstration on a high-throughput workflow in phenotyping, phenomics, and environmental control designed to study the interplay between plant architecture and climatic change.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128470905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI: 10.1145/2791347.2791384

M. S. Uysal, C. Beecks, Jochen Schmücking, T. Seidl

引用次数: 15