Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management最新文献_第3页

Integrating non-spatial preferences into spatial location queries 将非空间偏好集成到空间位置查询中

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618247

Qiang Qu, Siyuan Liu, B. Yang, Christian S. Jensen

{"title":"Integrating non-spatial preferences into spatial location queries","authors":"Qiang Qu, Siyuan Liu, B. Yang, Christian S. Jensen","doi":"10.1145/2618243.2618247","DOIUrl":"https://doi.org/10.1145/2618243.2618247","url":null,"abstract":"Increasing volumes of geo-referenced data are becoming available. This data includes so-called points of interest that describe businesses, tourist attractions, etc. by means of a geo-location and properties such as a textual description or ratings. We propose and study the efficient implementation of a new kind of query on points of interest that takes into account both the locations and properties of the points of interest. The query takes a result cardinality, a spatial range, and property-related preferences as parameters, and it returns a compact set of points of interest with the given cardinality and in the given range that satisfies the preferences. Specifically, the points of interest in the result set cover so-called allying preferences and are located far from points of interest that possess so-called alienating preferences. A unified result rating function integrates the two kinds of preferences with spatial distance to achieve this functionality. We provide efficient exact algorithms for this kind of query. To enable queries on large datasets, we also provide an approximate algorithm that utilizes a nearest-neighbor property to achieve scalable performance. We develop and apply lower and upper bounds that enable search-space pruning and thus improve performance. Finally, we provide a generalization of the above query and also extend the algorithms to support the generalization. We report on an experimental evaluation of the proposed algorithms using real point of interest data from Google Places for Business that offers insight into the performance of the proposed solutions.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"33 1","pages":"8:1-8:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83244330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Helping scientists reconnect their datasets 帮助科学家重新连接他们的数据集

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618263

Abdussalam Alawini, D. Maier, K. Tufte, Bill Howe

{"title":"Helping scientists reconnect their datasets","authors":"Abdussalam Alawini, D. Maier, K. Tufte, Bill Howe","doi":"10.1145/2618243.2618263","DOIUrl":"https://doi.org/10.1145/2618243.2618263","url":null,"abstract":"It seems inevitable that the datasets associated with a research project proliferate over time: collaborators may extend datasets with new measurements and new attributes, new experimental runs result in new files with similar structures, and subsets of data are extracted for independent analysis. As these \"residual\" datasets begin to accrete over time, scientists can lose track of the derivation history that connects them, complicating data sharing, provenance tracking, and scientific reproducibility. In this paper, focusing on data in spreadsheets, we consider how observable relationships between two datasets can help scientists recall their original derivation connection. For instance, if dataset A is wholly contained in dataset B, B may be a more recent version of A and should be preferred when archiving or publishing.\u0000 We articulate a space of relevant relationships, develop a set of algorithms for efficient discovery of these relationships, and organize these algorithms into a new system called ReConnect to assist scientists in relationship discovery. Our evaluation shows that existing approaches that rely on flagging differences between two spreadsheets are impractical for many relationship-discovery tasks, and a user study shows that ReConnect can improve scientists' ability to detect useful relationships and subsequently identify the best dataset for a given task.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"7 1","pages":"29:1-29:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88370061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Efficient temporal shortest path queries on evolving social graphs 演化社会图的有效时间最短路径查询

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618282

Wenyu Huo, V. Tsotras

引用次数: 38

Inverse predictions on continuous models in scientific databases 科学数据库中连续模型的逆预测

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618249

A. M. Zimmer, Philip Driessen, P. Kranen, T. Seidl

{"title":"Inverse predictions on continuous models in scientific databases","authors":"A. M. Zimmer, Philip Driessen, P. Kranen, T. Seidl","doi":"10.1145/2618243.2618249","DOIUrl":"https://doi.org/10.1145/2618243.2618249","url":null,"abstract":"Using continuous models in scientific databases has received an increased attention in the last years. It allows for a more efficient and accurate querying, as well as predictions of the outputs even where no measurements were performed. The most common queries are on how the output looks like for a given input setting. In this paper we study inverse model-based queries on continuous models, where one specifies a desired output and searches for the appropriate input setting, which falls into the reverse engineering category. We propose two possible approaches. The first one is an extension of the inverse regression paradigm. But simply switching the roles of input and output variables poses new challenges, which we overcome by using partial least squares. The second approach formulates the inverse prediction queries as linear optimization problems. We show that even though these two approaches seem completely different, they are closely related, and that the latter is more general. It facilitates the formulation of a wide range of queries, with specifications of fixed values and ranges in both input and output space, enabling the intuitive exploration of the experimental data and understanding the underlying process.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"26:1-26:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78162597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Offline cleaning of RFID trajectory data RFID轨迹数据的离线清洗

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618271

Bettina Fazzinga, S. Flesca, F. Furfaro, F. Parisi

{"title":"Offline cleaning of RFID trajectory data","authors":"Bettina Fazzinga, S. Flesca, F. Furfaro, F. Parisi","doi":"10.1145/2618243.2618271","DOIUrl":"https://doi.org/10.1145/2618243.2618271","url":null,"abstract":"An offline cleaning technique is proposed for translating the readings generated by RFID-tracked moving objects into positions over a map. It consists in a grid-based two-way filtering scheme embedding a sampling strategy for addressing missing detections. The readings are first processed in time order: at each time point t, the positions (i.e., cells of a grid assumed over the map) compatible with the reading at t are filtered according to their reachability from the positions that survived the filtering for the previous time point. Then, the positions that survived the first filtering are re-filtered, applying the same scheme in inverse order. As the two phases proceed, a probability is progressively evaluated for each candidate position at each time point t: at the end, this probability assembles the three probabilities of being the actual position given the past and future positions, and given the reading at t. A sampling procedure is employed at certain steps of the first filtering phase to intelligently reduce the number of cells to be considered as candidate positions at the next steps, as their number can grow dramatically in the presence of consecutive missing detections. The proposed approach is experimentally validated and shown to be efficient and effective in accomplishing its task.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"37 1","pages":"5:1-5:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80521943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Proactive adaptations in sensor network query processing 传感器网络查询处理中的主动适应

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618267

A. B. Stokes, N. Paton, A. Fernandes

{"title":"Proactive adaptations in sensor network query processing","authors":"A. B. Stokes, N. Paton, A. Fernandes","doi":"10.1145/2618243.2618267","DOIUrl":"https://doi.org/10.1145/2618243.2618267","url":null,"abstract":"Wireless sensor networks (WSN) are used by many applications for event and environmental monitoring. Due to the resource-limited nodes in WSNs, there has been much research into extending the functional lifetime of the network through energy-saving techniques. Sensor Network Query Processing (SNQP) is one such technique. SNQP uses information about a query and the WSN over which it is to be run, to generate an energy-efficient Query Execution Plan (QEP) that distributes processing in the form of QEP fragments to the nodes in the WSN. However, any QEP is likely to drain the batteries of the nodes unevenly, and, as a result, nodes used in a QEP may run out of energy when there are significant energy stocks still available in the WSN. An adaptive query processor could react to energy depletion, for example, by generating a revised plan that refrains from using the drained nodes. However, adapting only when a node has been depleted may provide few opportunities for the creation of effective new QEPs. In this paper, we introduce an approach that determines, at query compilation time, a sequence of QEPs with switch times for transitioning between successive plans with a view to extending the overall lifetime of the query. We describe how this approach has been implemented as an extension to an existing SNQP and present experimental results indicating that it can significantly increase QEP lifetimes.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"109 1","pages":"23:1-23:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80672967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

PStore: an efficient storage framework for managing scientific data PStore:用于管理科学数据的高效存储框架

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618268

Souvik Bhattacherjee, A. Deshpande, A. Sussman

{"title":"PStore: an efficient storage framework for managing scientific data","authors":"Souvik Bhattacherjee, A. Deshpande, A. Sussman","doi":"10.1145/2618243.2618268","DOIUrl":"https://doi.org/10.1145/2618243.2618268","url":null,"abstract":"In this paper, we present the design, implementation, and evaluation of PStore, a no-overwrite storage framework for managing large volumes of array data generated by scientific simulations. PStore consists of two modules, a data ingestion module and a query processing module, that respectively address two of the key challenges in scientific simulation data management. The data ingestion module is geared toward handling the high volumes of simulation data generated at a very rapid rate, which often makes it impossible to offload the data onto storage devices; the module is responsible for selecting an appropriate compression scheme for the data at hand, chunking the data, and then compressing it before sending it to the storage nodes. On the other hand, the query processing module is in charge of efficiently executing different types of queries over the stored data; in this paper, we specifically focus on dicing (also called range) queries. PStore provides a suite of compression schemes that leverage, and in some cases extend, existing techniques to provide support for diverse scientific simulation data. To efficiently execute queries over such compressed data, PStore adopts and extends a two-level chunking scheme by incorporating the effect of compression, and hides expensive disk latencies for long running range queries by exploiting chunk prefetching. In addition, we also parallelize the query processing module to further speed up execution. We evaluate PStore on a 140 GB dataset obtained from real-world simulations using the regional climate model CWRF [5]. In this paper, we use both 3D and 4D datasets and demonstrate high performance through extensive experiments.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"25:1-25:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78504145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

SAGA: array storage as a DB with support for structural aggregations SAGA:作为DB的数组存储，支持结构聚合

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618270

Yi Wang, Arnab Nandi, G. Agrawal

{"title":"SAGA: array storage as a DB with support for structural aggregations","authors":"Yi Wang, Arnab Nandi, G. Agrawal","doi":"10.1145/2618243.2618270","DOIUrl":"https://doi.org/10.1145/2618243.2618270","url":null,"abstract":"In recent years, many Array DBMSs, including SciDB and RasDaMan have emerged to meet the needs of data management applications where the natural structures are the arrays. These systems, like their relational counterparts, involve an expensive data ingestion phase. The paradigm of using native storage as a DB and providing database-like support (e.g., the NoDB approach) has recently been shown to be an effective approach for dealing with infrequently queried data, where data ingestion costs cannot be justified, though only in context of relational data.\u0000 Applications that generate massive arrays, such as the scientific simulations, often store the data in one of a small number of array storage formats, like NetCDF or HDF5. Thus, a natural question is, \"can database-like functionality be supported over native array storage?\". In this paper, we present algorithms, different partitioning strategies, and an analytical model for supporting structural (grid, sliding, hierarchical, and circular) aggregations over native array storage, and describe implementation of this approach in a system we refer to as Structural AGgregations over Array storage (SAGA). We show how the relative performance of different partitioning strategies changes with varying amount of computation in the aggregation function and different levels of data skew, and our model is effective in choosing the best partitioning strategy. Performance comparison with SciDB shows that despite working on native array storage, the aggregation costs with our system are lower. Finally, we also show that our structural aggregation implementations achieve high parallel efficiency.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"36 1","pages":"9:1-9:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89578801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

Distributed data placement to minimize communication costs via graph partitioning 分布式数据放置，通过图分区最小化通信成本

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618258

Lukasz Golab, Marios Hadjieleftheriou, H. Karloff, B. Saha

{"title":"Distributed data placement to minimize communication costs via graph partitioning","authors":"Lukasz Golab, Marios Hadjieleftheriou, H. Karloff, B. Saha","doi":"10.1145/2618243.2618258","DOIUrl":"https://doi.org/10.1145/2618243.2618258","url":null,"abstract":"With the widespread use of shared-nothing clusters of servers, there has been a proliferation of distributed object stores that offer high availability, reliability and enhanced performance for MapReduce-style workloads. However, data-intensive scientific workflows and join-intensive queries cannot always be evaluated efficiently using MapReduce-style processing without extensive data migrations, which cause network congestion and reduced query throughput. In this paper, we study the problem of computing data placement strategies that minimize the data communication costs incurred by such workloads in a distributed setting.\u0000 Our main contribution is a reduction of the data placement problem to the well-studied problem of Graph Partitioning, which is NP-Hard but for which efficient approximation algorithms exist. The novelty and significance of this result lie in representing the communication cost exactly and using standard graphs instead of hypergraphs, which were used in prior work on data placement that optimized for different objectives.\u0000 We study several practical extensions of the problem: with load balancing, with replication, and with complex workflows consisting of multiple steps that may be computed on different servers. We provide integer linear programs (IPs) that may be used with any IP solver to find an optimal data placement. For the no-replication case, we use publicly available graph partitioning libraries (e.g., METIS) to efficiently compute nearly-optimal solutions. For the versions with replication, we introduce two heuristics that utilize the Graph Partitioning solution of the no-replication case. Using a workload based on TPC-DS, it may take an IP solver weeks to compute an optimal data placement, whereas our reduction produces nearly-optimal solutions in seconds.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"20:1-20:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80467689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

DivIDE: efficient diversification for interactive data exploration 划分:交互式数据探索的高效多样化

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI: 10.1145/2618243.2618253

Hina A. Khan, M. Sharaf, Abdullah M. Albarrak

{"title":"DivIDE: efficient diversification for interactive data exploration","authors":"Hina A. Khan, M. Sharaf, Abdullah M. Albarrak","doi":"10.1145/2618243.2618253","DOIUrl":"https://doi.org/10.1145/2618243.2618253","url":null,"abstract":"Today, Interactive Data Exploration (IDE) has become a main constituent of many discovery-oriented applications, in which users repeatedly submit exploratory queries to identify interesting subspaces in large data sets. Returning relevant yet diverse results to such queries provides users with quick insights into a rather large data space. Meanwhile, search results diversification adds additional cost to an already computationally expensive exploration process. To address this challenge, in this paper, we propose a novel diversification scheme called DivIDE, which targets the problem of efficiently diversifying the results of queries posed during data exploration sessions. In particular, our scheme exploits the properties of data diversification functions while leveraging the natural overlap occurring between the results of different queries so that to provide significant reductions in processing costs. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to existing methods.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"31 1","pages":"15:1-15:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90803121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28