Proceedings. ACM-SIGMOD International Conference on Management of Data最新文献_第6页

Mobile interaction and query optimizationin a protein-ligand data analysis system 蛋白质配体数据分析系统的移动交互与查询优化

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465344

Marvin Lapeine, K. Herbert, Emily Hill, N. Goodey

引用次数: 1

Inter-media hashing for large-scale retrieval from heterogeneous data sources 用于从异构数据源进行大规模检索的跨媒体散列

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465274

Jingkuan Song, Yang Yang, Yi Yang, Zi-Liang Huang, Heng Tao Shen

{"title":"Inter-media hashing for large-scale retrieval from heterogeneous data sources","authors":"Jingkuan Song, Yang Yang, Yi Yang, Zi-Liang Huang, Heng Tao Shen","doi":"10.1145/2463676.2465274","DOIUrl":"https://doi.org/10.1145/2463676.2465274","url":null,"abstract":"In this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous multimedia data. It is able to return results of different media types from heterogeneous data sources, e.g., using a query image to retrieve relevant text documents or images from different data sources. This utilizes the widely available data from different sources and caters for the current users' demand of receiving a result list simultaneously containing multiple types of data to obtain a comprehensive understanding of the query's results. To enable large-scale inter-media retrieval, we propose a novel inter-media hashing (IMH) model to explore the correlations among multiple media types from different data sources and tackle the scalability issue. To this end, multimedia data from heterogeneous data sources are transformed into a common Hamming space, in which fast search can be easily implemented by XOR and bit-count operations. Furthermore, we integrate a linear regression model to learn hashing functions so that the hash codes for new data points can be efficiently generated. Experiments conducted on real-world large-scale multimedia datasets demonstrate the superiority of our proposed method compared with state-of-the-art techniques.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2014 1","pages":"785-796"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86489332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 517

Characterizing tenant behavior for placement and crisis mitigation in multitenant DBMSs 描述租户行为，以便在多租户dbms中进行安置和缓解危机

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465308

Aaron J. Elmore, Sudipto Das, A. Pucher, D. Agrawal, A. E. Abbadi, Xifeng Yan

{"title":"Characterizing tenant behavior for placement and crisis mitigation in multitenant DBMSs","authors":"Aaron J. Elmore, Sudipto Das, A. Pucher, D. Agrawal, A. E. Abbadi, Xifeng Yan","doi":"10.1145/2463676.2465308","DOIUrl":"https://doi.org/10.1145/2463676.2465308","url":null,"abstract":"A multitenant database management system (DBMS) in the cloud must continuously monitor the trade-off between efficient resource sharing among multiple application databases (tenants) and their performance. Considering the scale of attn{hundreds to} thousands of tenants in such multitenant DBMSs, manual approaches for continuous monitoring are not tenable. A self-managing controller of a multitenant DBMS faces several challenges. For instance, how to characterize a tenant given its variety of workloads, how to reduce the impact of tenant colocation, and how to detect and mitigate a performance crisis where one or more tenants' desired service level objective (SLO) is not achieved.\u0000 We present Delphi, a self-managing system controller for a multitenant DBMS, and Pythia, a technique to learn behavior through observation and supervision using DBMS-agnostic database level performance measures. Pythia accurately learns tenant behavior even when multiple tenants share a database process, learns good and bad tenant consolidation plans (or packings), and maintains a pertenant history to detect behavior changes. Delphi detects performance crises, and leverages Pythia to suggests remedial actions using a hill-climbing search algorithm to identify a new tenant placement strategy to mitigate violating SLOs. Our evaluation using a variety of tenant types and workloads shows that Pythia can learn a tenant's behavior with more than 92% accuracy and learn the quality of packings with more than 86% accuracy. During a performance crisis, Delphi is able to reduce 99th percentile latencies by 80%, and can consolidate 45% more tenants than a greedy baseline, which balances tenant load without modeling tenant behavior.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"9 1","pages":"517-528"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82587154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Iterative parallel data processing with stratosphere: an inside look 基于平流层的迭代并行数据处理:内部观察

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463693

Stephan Ewen, Sebastian Schelter, K. Tzoumas, Daniel Warneke, V. Markl

{"title":"Iterative parallel data processing with stratosphere: an inside look","authors":"Stephan Ewen, Sebastian Schelter, K. Tzoumas, Daniel Warneke, V. Markl","doi":"10.1145/2463676.2463693","DOIUrl":"https://doi.org/10.1145/2463676.2463693","url":null,"abstract":"Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis. With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively parallel fashion. In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment. Our approach supports the incremental processing nature of many of those algorithms.\u0000 In this demonstration proposal we illustrate the process of implementing, compiling, optimizing, and executing iterative algorithms on Stratosphere using examples from graph analysis and machine learning. For the first step, we show the algorithm's code and a visualization of the produced data flow programs. The second step shows the optimizer's execution plan choices, while the last phase monitors the execution of the program, visualizing the state of the operators and additional metrics, such as per-iteration runtime and number of updates.\u0000 To show that the data flow abstraction supports easy creation of custom programming APIs, we also present programs written against a lightweight Pregel API that is layered on top of our system with a small programming effort.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"113 1","pages":"1053-1056"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89301429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Efficient sentiment correlation for large-scale demographics 大规模人口统计数据的有效情感关联

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465317

Mikalai Tsytsarau, S. Amer-Yahia, Themis Palpanas

{"title":"Efficient sentiment correlation for large-scale demographics","authors":"Mikalai Tsytsarau, S. Amer-Yahia, Themis Palpanas","doi":"10.1145/2463676.2465317","DOIUrl":"https://doi.org/10.1145/2463676.2465317","url":null,"abstract":"Analyzing sentiments of demographic groups is becoming important for the Social Web, where millions of users provide opinions on a wide variety of content. While several approaches exist for mining sentiments from product reviews or micro-blogs, little attention has been devoted to aggregating and comparing extracted sentiments for different demographic groups over time, such as 'Students in Italy' or 'Teenagers in Europe'. This problem demands efficient and scalable methods for sentiment aggregation and correlation, which account for the evolution of sentiment values, sentiment bias, and other factors associated with the special characteristics of web data. We propose a scalable approach for sentiment indexing and aggregation that works on multiple time granularities and uses incrementally updateable data structures for online operation. Furthermore, we describe efficient methods for computing meaningful sentiment correlations, which exploit pruning based on demographics and use top-k correlations compression techniques. We present an extensive experimental evaluation with both synthetic and real datasets, demonstrating the effectiveness of our pruning techniques and the efficiency of our solution.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"6 1","pages":"253-264"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79567694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Timeline index: a unified data structure for processing queries on temporal data in SAP HANA 时间轴索引:在SAP HANA中处理时间数据查询的统一数据结构

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465293

Martin Kaufmann, Amin Amiri Manjili, Panagiotis Vagenas, Peter M. Fischer, Donald Kossmann, Franz Färber, Norman May

引用次数: 91

Information preservation in statistical privacy and bayesian estimation of unattributed histograms 统计隐私中的信息保存与无属性直方图的贝叶斯估计

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463721

Bing-Rong Lin, Daniel Kifer

{"title":"Information preservation in statistical privacy and bayesian estimation of unattributed histograms","authors":"Bing-Rong Lin, Daniel Kifer","doi":"10.1145/2463676.2463721","DOIUrl":"https://doi.org/10.1145/2463676.2463721","url":null,"abstract":"In statistical privacy, utility refers to two concepts: information preservation -- how much statistical information is retained by a sanitizing algorithm, and usability -- how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation and, afterward, the data consumers process the sanitized output according to their needs [22, 46].\u0000 We analyze a variety of utility measures and show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy three axioms related to information preservation. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability -- if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision- theoretic post-processing algorithm empirically outperforms previously proposed approaches.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"77 1","pages":"677-688"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76089754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Petabyte scale databases and storage systems at Facebook Facebook的pb级数据库和存储系统

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463713

Dhruba Borthakur

{"title":"Petabyte scale databases and storage systems at Facebook","authors":"Dhruba Borthakur","doi":"10.1145/2463676.2463713","DOIUrl":"https://doi.org/10.1145/2463676.2463713","url":null,"abstract":"At Facebook, we use various types of databases and storage system to satisfy the needs of different applications. The solutions built around these data store systems have a common set of requirements: they have to be highly scalable, maintenance costs should be low and they have to perform efficiently. We use a sharded mySQL+memcache solution to support real-time access of tens of petabytes of data and we use TAO to provide consistency of this web-scale database across geographical distances. We use Haystack data store for storing the 3 billion new photos we host every week. We use Apache Hadoop to mine intelligence from 100 petabytes of click logs and combine it with the power of Apache HBase to store all Facebook Messages.\u0000 This paper describes the reasons why each of these databases is appropriate for that workload and the design decisions and tradeoffs that were made while implementing these solutions. We touch upon the consistency, availability and partitioning tolerance of each of these solutions. We touch upon the reasons why some of these systems need ACID semantics and other systems do not. We describe the techniques we have used to map the Facebook Graph Database into a set of relational tables. We speak of how we plan to do big-data deployments across geographical locations and our requirements for a new breed of pure-memory and pure-SSD based transactional database.\u0000 Esteemed researchers in the Database Management community have benchmarked query latencies on Hive/Hadoop to be less performant than a traditional Parallel DBMS. We describe why these benchmarks are insufficient for Big Data deployments and why we continue to use Hadoop/Hive. We present an alternate set of benchmark techniques that measure capacity of a database, the value/byte in that database and the efficiency of inbuilt crowd-sourcing techniques to reduce administration costs of that database.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"70 1","pages":"1267-1268"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85351532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Cumulon: optimizing statistical data analysis in the cloud 积云:优化云中的统计数据分析

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465273

Botong Huang, S. Babu, Jun Yang

引用次数: 82

A direct mining approach to efficient constrained graph pattern discovery 一种高效约束图模式发现的直接挖掘方法

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463723

Feida Zhu, Zequn Zhang, Qiang Qu

{"title":"A direct mining approach to efficient constrained graph pattern discovery","authors":"Feida Zhu, Zequn Zhang, Qiang Qu","doi":"10.1145/2463676.2463723","DOIUrl":"https://doi.org/10.1145/2463676.2463723","url":null,"abstract":"Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitly-specified constraints cannot be handled in a way that is direct and precise. In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns --- the \"skinny\" patterns, which are graph patterns with a long backbone from which short twigs branch out. These patterns, which we formally define as l-long δ-skinny patterns, are able to reveal insightful spatial and temporal trajectory patterns in mobile data mining, information diffusion, adoption propagation, and many others.\u0000 Based on the key concept of a canonical diameter, we develop SkinnyMine, an efficient algorithm to mine all the l-long δ-skinny patterns guaranteeing both the completeness of our mining result as well as the unique generation of each target pattern. We also present a general direct mining framework together with two properties of reducibility and continuity for qualified constraints. Our experiments on both synthetic and real data demonstrate the effectiveness and scalability of our approach.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"85 1","pages":"821-832"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90615953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19