Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
Calvin: fast distributed transactions for partitioned database systems Calvin:用于分区数据库系统的快速分布式事务
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, D. Abadi
{"title":"Calvin: fast distributed transactions for partitioned database systems","authors":"Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, D. Abadi","doi":"10.1145/2213836.2213838","DOIUrl":"https://doi.org/10.1145/2213836.2213838","url":null,"abstract":"Many distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today's systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage, scales near-linearly on a cluster of commodity machines, and has no single point of failure. By replicating transaction inputs rather than effects, Calvin is also able to support multiple consistency levels---including Paxos-based strong consistency across geographically distant replicas---at no cost to transactional throughput.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128198756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 534
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster CloudRAMSort:在无共享集群上快速高效的大规模分布式RAM排序
Changkyu Kim, Jongsoo Park, N. Satish, Hongrae Lee, P. Dubey, J. Chhugani
{"title":"CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster","authors":"Changkyu Kim, Jongsoo Park, N. Satish, Hongrae Lee, P. Dubey, J. Chhugani","doi":"10.1145/2213836.2213965","DOIUrl":"https://doi.org/10.1145/2213836.2213965","url":null,"abstract":"Sorting is a fundamental kernel used in many database operations. The total memory available across cloud computers is now sufficient to store even hundreds of terabytes of data in-memory. Applications requiring high-speed data analysis typically use in-memory sorting. The two most important factors in designing a high-speed in-memory sorting system are the single-node sorting performance and inter-node communication. In this paper, we present CloudRAMSort, a fast and efficient system for large-scale distributed sorting on shared-nothing clusters. CloudRAMSort performs multi-node optimizations by carefully overlapping computation with inter-node communication. The system uses a dynamic multi-stage random sampling approach for improved load-balancing between nodes. CloudRAMSort maximizes per-node efficiency by exploiting modern architectural features such as multiple cores and SIMD (Single-Instruction Multiple Data) units. This holistic combination results in the highest performing sorting performance on distributed shared-nothing platforms. CloudRAMSort sorts 1 Terabyte (TB) of data in 4.6 seconds on a 256-node Xeon X5680 cluster called the Intel Endeavor system. CloudRAMSort also performs well on heavily skewed input distributions, sorting 1 TB of data generated using Zipf distribution in less than 5 seconds. We also provide a detailed analytical model that accurately projects (within avg. 7%) the performance of CloudRAMSort with varying tuple sizes and interconnect bandwidths. Our analytical model serves as a useful tool to analyze performance bottlenecks on current systems and project performance with future architectural advances. With architectural trends of increasing number of cores, bandwidth, SIMD width, cache-sizes, and interconnect bandwidth, we believe CloudRAMSort would be the system of choice for distributed sorting of large-scale in-memory data of current and future systems","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"66 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Fast sampling word correlations of high dimensional text data (abstract only) 高维文本数据的快速单词相关性采样(仅摘要)
Frank Rosner, Alexander Hinneburg, Martin Gleditzsch, Matthias Priebe, A. Both
{"title":"Fast sampling word correlations of high dimensional text data (abstract only)","authors":"Frank Rosner, Alexander Hinneburg, Martin Gleditzsch, Matthias Priebe, A. Both","doi":"10.1145/2213836.2213976","DOIUrl":"https://doi.org/10.1145/2213836.2213976","url":null,"abstract":"Finding correlated words in large document collections is an important ingredient for text analytics. The naïve approach computes the correlations of each word against all other words and filters for highly correlated word pairs. Clearly, this quadratic method cannot be applied to real world scenarios with millions of documents and words. Our main contribution is to transform the task of finding highly correlated word pairs into a word clustering problem that is efficiently solved by locality sensitive hashing (LSH). A key insight of our new method is to note that the empirical Pearson correlation between two words is the cosine of the angle between the centered versions of their word vectors. The angle can be approximated by an LSH scheme. Although centered word vectors are not sparse, the computation of the LSH hash functions can exploit the inherent sparsity of the word data. This leads to an efficient way to detect collisions between centered word vectors having a small angle and therefore provides a fast algorithm to sample highly correlated word pairs. Our new method based on LSH improves run time complexity of the enhanced naïve algorithm. This algorithm reduces the dimensionality of the word vectors using random projection and approximates correlations by computing cosine similarity on the reduced and centered word vectors. However, this method still has quadratic run time. Our new method replaces the filtering for high correlations in the naïve algorithm with finding hash collisions, which can be done by sorting the hash values of the word vectors. We evaluate the scalability of our new algorithm to large text collections.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114824767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Just-in-time information extraction using extraction views 使用提取视图进行实时信息提取
Amr El-Helw, Mina H. Farid, I. Ilyas
{"title":"Just-in-time information extraction using extraction views","authors":"Amr El-Helw, Mina H. Farid, I. Ilyas","doi":"10.1145/2213836.2213913","DOIUrl":"https://doi.org/10.1145/2213836.2213913","url":null,"abstract":"","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123910548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Interactive regret minimization 交互式遗憾最小化
Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, K. Makino
{"title":"Interactive regret minimization","authors":"Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, K. Makino","doi":"10.1145/2213836.2213850","DOIUrl":"https://doi.org/10.1145/2213836.2213850","url":null,"abstract":"We study the notion of regret ratio proposed in [19] Nanongkai et al. [VLDB10] to deal with multi-criteria decision making in database systems. The regret minimization query proposed in [19] Nanongkai et al. was shown to have features of both skyline and top-k: it does not need information from the user but still controls the output size. While this approach is suitable for obtaining a reasonably small regret ratio, it is still open whether one can make the regret ratio arbitrarily small. Moreover, it remains open whether reasonable questions can be asked to the users in order to improve efficiency of the process. In this paper, we study the problem of minimizing regret ratio when the system is enhanced with interaction. We assume that when presented with a set of tuples the user can tell which tuple is most preferred. Under this assumption, we develop the problem of interactive regret minimization where we fix the number of questions and tuples per question that we can display, and aim at minimizing the regret ratio. We try to answer two questions in this paper: (1) How much does interaction help? That is, how much can we improve the regret ratio when there are interactions? (2) How efficient can interaction be? In particular, we measure how many questions we have to ask the user in order to make her regret ratio small enough. We answer both questions from both theoretical and practical standpoints. For the first question, we show that interaction can reduce the regret ratio almost exponentially. To do this, we prove a lower bound for the previous approach (thereby resolving an open problem from [19] Nanongkai et al.), and develop an almost-optimal upper bound that makes the regret ratio exponentially smaller. Our experiments also confirm that, in practice, interactions help in improving the regret ratio by many orders of magnitude. For the second question, we prove that when our algorithm shows a reasonable number of points per question, it only needs a few questions to make the regret ratio small. Thus, interactive regret minimization seems to be a necessary and sufficient way to deal with multi-criteria decision making in database systems.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129119066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Managing and mining large graphs: patterns and algorithms 管理和挖掘大型图:模式和算法
C. Faloutsos, U. Kang
{"title":"Managing and mining large graphs: patterns and algorithms","authors":"C. Faloutsos, U. Kang","doi":"10.1145/2213836.2213906","DOIUrl":"https://doi.org/10.1145/2213836.2213906","url":null,"abstract":"Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algorithms for handling graphs with billions of nodes and edges? These are exactly the goals of this tutorial. We start with the patterns in real-world static, weighted, and dynamic graphs. Then we describe important tools for large graph mining, including singular value decomposition, and Hadoop. Finally, we conclude with the design and the implementation of scalable graph mining algorithms on Hadoop. This tutorial is complementary to the related tutorial \"Managing and Mining Large Graphs: Systems and Implementations\".","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130100353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Tiresias: a demonstration of how-to queries 泰瑞西亚:如何查询的演示
A. Meliou, Yisong Song, Dan Suciu
{"title":"Tiresias: a demonstration of how-to queries","authors":"A. Meliou, Yisong Song, Dan Suciu","doi":"10.1145/2213836.2213939","DOIUrl":"https://doi.org/10.1145/2213836.2213939","url":null,"abstract":"In this demo, we will present Tiresias, the first how-to query engine. How-to queries represent fundamental data analysis questions of the form: \"How should the input change in order to achieve the desired output\". They exemplify an important Reverse Data Management problem: solving constrained optimization problems over data residing in a DBMS. Tiresias, named after the mythical oracle of Thebes, has complex under-workings, but includes a simple interface that allows users to load datasets and interactively design optimization problems by simply selecting actions, key performance indicators, and objectives. The user choices are translated into a declarative query, which is then processed by Tiresias and translated into a Mixed Integer Program: we then use an MIP solver to find a solution. The solution is then presented to the user as an interactive data instance. The user can provide feedback by rejecting certain tuples and/or values. Then, based on the user feedback, Tiresias automatically refines the how-to query and presents a new set of results.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130039333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallel main-memory indexing for moving-object query and update workloads 用于移动对象查询和更新工作负载的并行主存索引
Darius Sidlauskas, Simonas Šaltenis, Christian S. Jensen
{"title":"Parallel main-memory indexing for moving-object query and update workloads","authors":"Darius Sidlauskas, Simonas Šaltenis, Christian S. Jensen","doi":"10.1145/2213836.2213842","DOIUrl":"https://doi.org/10.1145/2213836.2213842","url":null,"abstract":"We are witnessing a proliferation of Internet-worked, geo-positioned mobile devices such as smartphones and personal navigation devices. Likewise, location-related services that target the users of such devices are proliferating. Consequently, server-side infrastructures are needed that are capable of supporting the location-related query and update workloads generated by very large populations of such moving objects. This paper presents a main-memory indexing technique that aims to support such workloads. The technique, called PGrid, uses a grid structure that is capable of exploiting the parallelism offered by modern processors. Unlike earlier proposals that maintain separate structures for updates and queries, PGrid allows both long-running queries and rapid updates to operate on a single data structure and thus offers up-to-date query results. Because PGrid does not rely on creating snapshots, it avoids the stop-the-world problem that occurs when workload processing is interrupted to perform such snapshotting. Its concurrency control mechanism relies instead on hardware-assisted atomic updates as well as object-level copying, and it treats updates as non-divisible operations rather than as combinations of deletions and insertions; thus, the query semantics guarantee that no objects are missed in query results. Empirical studies demonstrate that PGrid scales near-linearly with the number of hardware threads on four modern multi-core processors. Since both updates and queries are processed on the same current data-store state, PGrid outperforms snapshot-based techniques in terms of both query freshness and CPU cycle-wise efficiency.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126655403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
ColumbuScout: towards building local search engines over large databases ColumbuScout:在大型数据库上构建本地搜索引擎
Cody Hansen, Feifei Li
{"title":"ColumbuScout: towards building local search engines over large databases","authors":"Cody Hansen, Feifei Li","doi":"10.1145/2213836.2213914","DOIUrl":"https://doi.org/10.1145/2213836.2213914","url":null,"abstract":"In many database applications, search is still executed via form based query interfaces, which are then translated into SQL statements to find matching records. Ranking is usually not implemented unless users have explicitly indicated how to rank the matching records, e.g., in the ascending order of year. Often, this approach is neither intuitive nor user friendly (especially with many search fields in a query form). It also requires application developers to design schema-specific query forms and develop specific programs that understand these forms. In this work, we propose to demonstrate the ColumbuScout system that aims at quickly building and deploying a local search engine over one or more large databases. The ColumbuScout system adopts a search-engine-style approach for searches over local databases. It introduces its own indexing structures and storage designs, to improve its overall efficiency and scalability. We will demonstrate that it is simple for application developers to deploy ColumbuScout over any databases, and ColumbuScout is able to support search engine-like types of search over large databases efficiently and effectively.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125262521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TreeSpan: efficiently computing similarity all-matching TreeSpan:高效计算相似度全匹配
Gaoping Zhu, Xuemin Lin, Ke Zhu, W. Zhang, J. Yu
{"title":"TreeSpan: efficiently computing similarity all-matching","authors":"Gaoping Zhu, Xuemin Lin, Ke Zhu, W. Zhang, J. Yu","doi":"10.1145/2213836.2213896","DOIUrl":"https://doi.org/10.1145/2213836.2213896","url":null,"abstract":"Given a query graph $q$ and a data graph G, computing all occurrences of q in G, namely exact all-matching, is fundamental in graph data analysis with a wide spectrum of real applications. It is challenging since even finding one occurrence of q in G (subgraph isomorphism test) is NP-Complete. Consider that in many real applications, exploratory queries from users are often inaccurate to express their real demands. In this paper, we study the problem of efficiently computing all approximate occurrences of q in G. Particularly, we study the problem of efficiently retrieving all matches of q in G with the number of possible missing edges bounded by a given threshold θ, namely similarity all-matching. The problem of similarity all-matching is harder than the problem of exact all-matching since it covers the problem of exact all-matching as a special case with θ = 0. In this paper, we develop a novel paradigm to conduct similarity all-matching. Specifically, we propose to use a minimal set QT of spanning trees in q to cover all connected subgraphs q' of q missing at most θ edges; that is, each q' is spanned by a spanning tree in QT. Then, we conduct exact all-matching for each spanning tree in QT to induce all similarity matches. A rigid theoretic analysis shows that our new search paradigm significantly reduces the times of conducting exact all-matching against the existing techniques. To further speed-up the computation, we develop new filtering, computation sharing, and search ordering techniques. Our comprehensive experiments on both real and synthetic datasets demonstrate that our techniques outperform the state of the art technique by 7 orders of magnitude.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122351875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信