Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management最新文献
{"title":"Protection of sensitive trajectory datasets through spatial and temporal exchange","authors":"Elham Naghizade, L. Kulik, E. Tanin","doi":"10.1145/2618243.2618278","DOIUrl":"https://doi.org/10.1145/2618243.2618278","url":null,"abstract":"Privacy concerns place a great impediment to publishing and/or exchanging trajectory data across companies and institutions. This has urged researchers to address privacy issues prior to trajectory data release. Currently, privacy preserving solutions distort original data unnecessarily, hence, degrade data utility and make such data less useful for third parties. We consider a trajectory as a sequence of stops and moves, and propose an approach that exploits features of a trajectory as means for preserving privacy while maintaining a high level of utility. We introduce the concept of sensitivity for stops based on the assumption that they are more vulnerable to privacy threats. We propose an efficient algorithm that either substitutes sensitive stop points of a trajectory with moves from the same trajectory or introduces a minimal detour if a less sensitive stop can not be found on the same route. Our experiments shows that our method balances user privacy and data utility: it protects privacy through preventing an adversary from making inferences about sensitive stops while maintaining a high level of data similarity to the original dataset.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"43 1","pages":"40:1-40:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87105648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A provable algorithmic approach to product selection problems for market entry and sustainability","authors":"Silei Xu, Yishi Lin, Hong Xie, John C.S. Lui","doi":"10.1145/2618243.2618250","DOIUrl":"https://doi.org/10.1145/2618243.2618250","url":null,"abstract":"Given the globalized economy, how to process the heterogeneous web data so to extract customers' purchase behavior is crucial to manufacturers who want to enter or sustain in a competitive market. To maximize the sales, manufacturers not only need to decide what products to produce so to meet diverse customers' requirements, but at the same time, compete with competitors' products. In this paper, we present a general framework for the following product selection problems: (1) k-BSP problem, which is for a manufacturer to enter a competitive market, and (2) k-BBP problem, which is for a manufacturer to sustain in a competitive market. We propose several product adoption models to describe the complex purchase behavior of customers, and formally show that these problems are NP-hard in general. To tackle these problems, we propose computationally efficient greedy-based approximation algorithms. Based on the submodularity analysis, we prove that our algorithms can guarantee a (1--1/e)-approximation ratio as compared to the optimal solutions. We perform large scale data analysis to show the efficiency and accuracy of our framework. In our experiments, we observe 1,300 to 250,000 times speedup as compared to the exhaustive algorithms, and our solutions can achieve on average 96% of solution quality as compared to the optimal solutions. Finally, we apply our algorithms on web dataset to show the impact of customers' different purchase behavior on the results of product selection.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"29 1","pages":"19:1-19:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82806787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengcheng Yu, Fan Xia, Weining Qian, Aoying Zhou, Jianlong Chang
{"title":"On efficiently generating realistic social media timeline structures","authors":"Chengcheng Yu, Fan Xia, Weining Qian, Aoying Zhou, Jianlong Chang","doi":"10.1145/2618243.2618272","DOIUrl":"https://doi.org/10.1145/2618243.2618272","url":null,"abstract":"A framework of synthetic data generator to generate social media timeline structures is proposed in this paper, which is useful for benchmarking query processing over social media data, and validating hypothesis over users' behavior. It is flexible to generate synthetic data with different distributions. With the help of its asynchronized parallel processing model and delayed update strategy, it is efficient to feed out timeline structure with high throughput. We show in experiments that our method can generate realistic social media timeline structures efficiently.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"15 1","pages":"45:1-45:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89420643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A case study in optimizing continuous queries using the magic update technique","authors":"Andreas Behrend, Gereon Schüller","doi":"10.1145/2618243.2618285","DOIUrl":"https://doi.org/10.1145/2618243.2618285","url":null,"abstract":"The evaluation of continuous queries over data streams often becomes difficult as soon as static context data must be combined with dynamic stream data. This is especially the case if the context data is organized in form of view hierarchies and thus computed from some base facts. In this scenario, typical algebraic optimization strategies fail in providing a well-optimized query evaluation plan which effectively combines the stream and classical view subparts of the given query. The Magic Update method represents a possible solution to this problem as it allows for dynamically generating new selection conditions from the data stream which are pushed into the view hierarchy of context data. In this paper we present a case study in which the performance gain of this technique is shown when optimizing anomaly detection views in an air-traffic surveillance scenario.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"27 1","pages":"31:1-31:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76406929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system for efficient and simultaneous processing of moving K nearest neighbor and spatial keyword queries","authors":"Chongsheng Zhang","doi":"10.1145/2618243.2618290","DOIUrl":"https://doi.org/10.1145/2618243.2618290","url":null,"abstract":"We study the efficient, generic processing of moving K nearest neighbor (MKNN) and top-K spatial keyword (MKSK) queries. Such generic processing is attractive during high query loads. We propose GridVoronoi--an index that enables users to find the spatial nearest neighbor (NN) from uniformly distributed datasets in almost O(1) time. GridVoronoi is based upon Voronoi diagram which has proven to be highly efficient in exploring the local neighborhood of a given Voronoi cell. However, Voronoi diagram needs a method to promptly find out which Voronoi cell contains the query point. So we add a virtual (i.e., conceptual) grid to the Voronoi diagram. For any query point, GridVoronoi first uses the grid to compute which Voronoi cell contains the query, next utilizes Voronoi diagram to quickly find the NN and KNN (i.e., K nearest neighbors) of the query.\u0000 Upon GridVoronoi we introduce UniSpatial framework that is able to simultaneously process MKNN and MKSK queries. For each keyword, UniSpatial builds a GridVoronoi index that enables the fast retrieval of the spatial Web objects containing this keyword. UniSpatial employs the same method to process MKNN and MKSK queries, but for MKSK queries it needs to rank the retrieved objects by their proximity to the query location and textual relevance to the input keywords. In the demo, we will use real datasets to show the functionality and performance of UniSpatial.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"25 1","pages":"50:1-50:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78223579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data perturbation for outlier detection ensembles","authors":"A. Zimek, R. Campello, J. Sander","doi":"10.1145/2618243.2618257","DOIUrl":"https://doi.org/10.1145/2618243.2618257","url":null,"abstract":"Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Building an ensemble requires learning of diverse models and combining these diverse models in an appropriate way. We propose data perturbation as a new technique to induce diversity in individual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble. In an extensive evaluation, we study the impact, potential, and shortcomings of this new approach for outlier detection ensembles. We show that this ensemble can significantly improve over weak performing base methods.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"50 1","pages":"13:1-13:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72810350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SLACID - sparse linear algebra in a column-oriented in-memory database system","authors":"D. Kernert, F. Köhler, Wolfgang Lehner","doi":"10.1145/2618243.2618254","DOIUrl":"https://doi.org/10.1145/2618243.2618254","url":null,"abstract":"Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"23 1","pages":"11:1-11:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72900372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing evolving shapes in sensor networks","authors":"Besim Avci, Goce Trajcevski, P. Scheuermann","doi":"10.1145/2618243.2618264","DOIUrl":"https://doi.org/10.1145/2618243.2618264","url":null,"abstract":"This work addresses the problem of efficient distributed detection and tracking of mobile and evolving/deformable spatial shapes in Wireless Sensor Networks (WSN). The shapes correspond to contiguous regions bounding the locations of sensors in which the readings of the sensors satisfy a particular threshold-based criterion related to the values of a physical phenomenon that they measure. We formalize the predicates representing the shapes in such settings and present detection algorithms. In addition, we provide a light-weight protocol and aggregation methods for energy-efficient distributed execution of those algorithms. Another contribution of this work is that we developed efficient techniques for detecting a co-occurrence of shapes within a given proximity from each other. Our experiments demonstrate that, when compared to the centralized techniques -- which is, predicates being detected in a dedicated sink -- as well as distributed periodic contours construction, our methodologies yield significant energy/communication savings.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"37 5","pages":"22:1-22:12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91551195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward efficient and reliable genome analysis using main-memory database systems","authors":"Sebastian Dorok, S. Breß, H. Läpple, G. Saake","doi":"10.1145/2618243.2618276","DOIUrl":"https://doi.org/10.1145/2618243.2618276","url":null,"abstract":"Improvements in DNA sequencing technologies allow to sequence complete human genomes in a short time and at acceptable cost. Hence, the vision of genome analysis as standard procedure to support and improve medical treatment becomes reachable. In this vision paper, we describe important data-management challenges that have to be met to make this vision come true. Besides genome-analysis performance, data-management capabilities such as data provenance and data integrity become increasingly important to enable comprehensible and reliable genome analysis. We argue to meet these challenges by using main-memory database technologies, which combine fast processing capabilities with extensive data-management capabilities. Finally, we discuss possibilities of integrating genome-analysis tasks into DBMSs and derive new research questions.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"45 1","pages":"34:1-34:4"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74738892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending the SQL array concept to support scientific analytics","authors":"D. Misev, P. Baumann","doi":"10.1145/2618243.2618255","DOIUrl":"https://doi.org/10.1145/2618243.2618255","url":null,"abstract":"Arrays are among those data types which contribute the most to Big Data -- examples include satellite images and weather simulation output in the Earth sciences, confocal microscopy and CAT scans in the Life sciences, as well as telescope and cosmological observations in Space science, to name but a few. Traditionally, the database community has neglected this, with the effect that ad-hoc implementations prevail. With the advent of NewSQL in recent years, however, the database scope has broadened, and array modelling and query support is seriously considered. Different models have been suggested, some of which are implemented or under implementation, and a consolidation of concepts can be observed. Consequently, integration of array queries into SQL is being addressed.\u0000 We fill this gap by proposing a generic model, ASQL, for modelling and querying multi-dimensional arrays in ISO SQL. The model integrates concepts from the three major array models seen today: rasdaman, SciQL, and SciDB. It is declarative, optimizable, minimal, yet powerful enough for application domains in science, engineering, and beyond. ASQL has been implemented and is currently being discussed in ISO for extending standard SQL.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"152 1","pages":"10:1-10:11"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86226236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}