Proceedings. ACM-SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
Fine-grained disclosure control for app ecosystems 应用生态系统的细粒度披露控制
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2467798
G. Bender, Lucja Kot, J. Gehrke, Christoph E. Koch
{"title":"Fine-grained disclosure control for app ecosystems","authors":"G. Bender, Lucja Kot, J. Gehrke, Christoph E. Koch","doi":"10.1145/2463676.2467798","DOIUrl":"https://doi.org/10.1145/2463676.2467798","url":null,"abstract":"The modern computing landscape contains an increasing number of app ecosystems, where users store personal data on platforms such as Facebook or smartphones. APIs enable third-party applications (apps) to utilize that data. A key concern associated with app ecosystems is the confidentiality of user data.\u0000 In this paper, we develop a new model of disclosure in app ecosystems. In contrast with previous solutions, our model is data-derived and semantically meaningful. Information disclosure is modeled in terms of a set of distinguished security views. Each query is labeled with the precise set of security views that is needed to answer it, and these labels drive policy decisions.\u0000 We explain how our disclosure model can be used in practice and provide algorithms for labeling conjunctive queries for the case of single-atom security views. We show that our approach is useful by demonstrating the scalability of our algorithms and by applying it to the real-world disclosure control system used by Facebook.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"1 1","pages":"869-880"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90247203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
GeoDeepDive: statistical inference using familiar data-processing languages GeoDeepDive:使用熟悉的数据处理语言进行统计推断
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463680
Ce Zhang, Vidhya Govindaraju, J. Borchardt, Timothy L. Foltz, C. Ré, S. Peters
{"title":"GeoDeepDive: statistical inference using familiar data-processing languages","authors":"Ce Zhang, Vidhya Govindaraju, J. Borchardt, Timothy L. Foltz, C. Ré, S. Peters","doi":"10.1145/2463676.2463680","DOIUrl":"https://doi.org/10.1145/2463676.2463680","url":null,"abstract":"We describe our proposed demonstration of GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles. This requires solving a host of classical data management challenges including data acquisition (e.g., from scanned documents), data extraction, and data integration. SIGMOD attendees will see demonstrations of three aspects of our system: (1) an end-to-end system that is of a high enough quality to perform novel geological science, but is written by a small enough team so that each aspect can be manageably explained; (2) a simple feature engineering system that allows a user to write in familiar SQL or Python; and (3) the effect of different sources of feedback on result quality including expert labeling, distant supervision, traditional rules, and crowd-sourced data.\u0000 Our prototype builds on our work integrating statistical inference and learning tools into traditional database systems. If successful, our demonstration will allow attendees to see that data processing systems that use machine learning contain many familiar data processing problems such as efficient querying, indexing, and supporting tools for database-backed websites, none of which are machine-learning problems, per se.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"3 1","pages":"993-996"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84485759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Split query processing in polybase 拆分查询处理在polybase
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463709
D. DeWitt, A. Halverson, Rimma V. Nehme, S. Shankar, J. Aguilar-Saborit, Artin Avanes, Miro Flasza, J. Gramling
{"title":"Split query processing in polybase","authors":"D. DeWitt, A. Halverson, Rimma V. Nehme, S. Shankar, J. Aguilar-Saborit, Artin Avanes, Miro Flasza, J. Gramling","doi":"10.1145/2463676.2463709","DOIUrl":"https://doi.org/10.1145/2463676.2463709","url":null,"abstract":"This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop cluster using the standard SQL query language. Unlike other database systems that provide only a relational view over HDFS-resident data through the use of an external table mechanism, Polybase employs a split query processing paradigm in which SQL operators on HDFS-resident data are translated into MapReduce jobs by the PDW query optimizer and then executed on the Hadoop cluster. The paper describes the design and implementation of Polybase along with a thorough performance evaluation that explores the benefits of employing a split query processing paradigm for executing queries that involve both structured data in a relational DBMS and unstructured data in Hadoop. Our results demonstrate that while the use of a split-based query execution paradigm can improve the performance of some queries by as much as 10X, one must employ a cost-based query optimizer that considers a broad set of factors when deciding whether or not it is advantageous to push a SQL operator to Hadoop. These factors include the selectivity factor of the predicate, the relative sizes of the two clusters, and whether or not their nodes are co-located. In addition, differences in the semantics of the Java and SQL languages must be carefully considered in order to avoid altering the expected results of a query.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"10 1","pages":"1255-1266"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87495680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 150
GRDB: a system for declarative and interactive analysis of noisy information networks GRDB:用于嘈杂信息网络的声明性和交互式分析的系统
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465257
W. E. Moustafa, Hui Miao, A. Deshpande, L. Getoor
{"title":"GRDB: a system for declarative and interactive analysis of noisy information networks","authors":"W. E. Moustafa, Hui Miao, A. Deshpande, L. Getoor","doi":"10.1145/2463676.2465257","DOIUrl":"https://doi.org/10.1145/2463676.2465257","url":null,"abstract":"There is a growing interest in methods for analyzing data describing networks of all types, including biological, physical, social, and scientific collaboration networks. Typically the data describing these networks is observational, and thus noisy and incomplete; it is often at the wrong level of fidelity and abstraction for meaningful data analysis. This demonstration presents GrDB, a system that enables data analysts to write declarative programs to specify and combine different network data cleaning tasks, visualize the output, and engage in the process of decision review and correction if necessary. The declarative interface of GrDB makes it very easy to quickly write analysis tasks and execute them over data, while the visual component facilitates debugging the program and performing fine grained corrections.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"1 1","pages":"1085-1088"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89771466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Noah: a dynamic ridesharing system 诺亚:一个动态的拼车系统
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463695
Charles Tian, Y. Huang, Zhi Liu, F. Bastani, R. Jin
{"title":"Noah: a dynamic ridesharing system","authors":"Charles Tian, Y. Huang, Zhi Liu, F. Bastani, R. Jin","doi":"10.1145/2463676.2463695","DOIUrl":"https://doi.org/10.1145/2463676.2463695","url":null,"abstract":"This demo presents Noah: a dynamic ridesharing system. Noah supports large scale real-time ridesharing with service guarantee on road networks. Taxis and trip requests are dynamically matched. Different from traditional systems, a taxi can have more than one customer on board given that all waiting time and service time constraints of trips are satisfied. Noah's real-time response relies on three main components: (1) a fast shortest path algorithm with caching on road networks; (2) fast dynamic matching algorithms to schedule ridesharing on the fly; (3) a spatial indexing method for fast retrieving moving taxis. Users will be able to submit requests from a smartphone, choose specific parameters such as number of taxis in the system, service constraints, and matching algorithms, to explore the internal functionalities and implementations of Noah. The system analyzer will show the system performance including average waiting time, average detour percentage, average response time, and average level of sharing. Taxis, routes, and requests will be animated and visualized through Google Maps API. The demo is based on trips of 17,000 Shanghai taxis for one day (May 29, 2009); the dataset contains 432,327 trips. Each trip includes the starting and destination coordinates and the start time. An iPhone application is implemented to allow users to submit a trip request to the Noah system during the demonstration.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"9 1","pages":"985-988"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89777774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Parallel analytics as a service 并行分析即服务
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463714
Petrie Wong, Zhian He, Eric Lo
{"title":"Parallel analytics as a service","authors":"Petrie Wong, Zhian He, Eric Lo","doi":"10.1145/2463676.2463714","DOIUrl":"https://doi.org/10.1145/2463676.2463714","url":null,"abstract":"Recently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become attractive for companies having analytical tasks on only hundreds gigabytes to some ten terabytes of data because they can enjoy high-end parallel analytics at a cheap cost. This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service. The major research issue is how to achieve a lower total cost of ownership by consolidating thousands of MPPDB tenants on to a shared hardware infrastructure, with a performance SLA that guarantees the tenants can obtain the query results as if they are executing their queries on dedicated machines. Thrifty achieves the goal by using a tenant-driven design that includes (1) a cluster design that carefully arranges the nodes in the cluster into groups and creates an MPPDB for each group of nodes, (2) a tenant placement that assigns each tenant to several MPPDBs (for high availability service through replication), and (3) a query routing algorithm that routes a tenant's query to the proper MPPDB at run-time. Experiments show that in a MPPDBaaS with 5000 tenants, where each tenant requests 2 to 32 nodes MPPDB to query against 200GB to 3.2TB of data, Thrifty can serve all the tenants with a 99.9% performance SLA guarantee and a high availability replication factor of 3, using only 18.7% of the nodes requested by the tenants.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"7 1","pages":"25-36"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89039381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
BigBench: towards an industry standard benchmark for big data analytics BigBench:迈向大数据分析的行业标准基准
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463712
A. Ghazal, T. Rabl, Minqing Hu, Francois Raab, Meikel Poess, A. Crolotte, H. Jacobsen
{"title":"BigBench: towards an industry standard benchmark for big data analytics","authors":"A. Ghazal, T. Rabl, Minqing Hu, Francois Raab, Meikel Poess, A. Crolotte, H. Jacobsen","doi":"10.1145/2463676.2463712","DOIUrl":"https://doi.org/10.1145/2463676.2463712","url":null,"abstract":"There is a tremendous interest in big data by academia, industry and a large user base. Several commercial and open source providers unleashed a variety of products to support big data storage and processing. As these products mature, there is a need to evaluate and compare the performance of these systems.\u0000 In this paper, we present BigBench, an end-to-end big data benchmark proposal. The underlying business model of BigBench is a product retailer. The proposal covers a data model and synthetic data generator that addresses the variety, velocity and volume aspects of big data systems containing structured, semi-structured and unstructured data. The structured part of the BigBench data model is adopted from the TPC-DS benchmark, which is enriched with semi-structured and unstructured data components. The semi-structured part captures registered and guest user clicks on the retailer's website. The unstructured data captures product reviews submitted online. The data generator designed for BigBench provides scalable volumes of raw data based on a scale factor. The BigBench workload is designed around a set of queries against the data model. From a business prospective, the queries cover the different categories of big data analytics proposed by McKinsey. From a technical prospective, the queries are designed to span three different dimensions based on data sources, query processing types and analytic techniques.\u0000 We illustrate the feasibility of BigBench by implementing it on the Teradata Aster Database. The test includes generating and loading a 200 Gigabyte BigBench data set and testing the workload by executing the BigBench queries (written using Teradata Aster SQL-MR) and reporting their response times.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"44 1","pages":"1197-1208"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76438956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 373
Trinity: a distributed graph engine on a memory cloud Trinity:一个基于内存云的分布式图形引擎
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2467799
Bin Shao, Haixun Wang, Yatao Li
{"title":"Trinity: a distributed graph engine on a memory cloud","authors":"Bin Shao, Haixun Wang, Yatao Li","doi":"10.1145/2463676.2467799","DOIUrl":"https://doi.org/10.1145/2463676.2467799","url":null,"abstract":"Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of efficient random access required by graph computation. On the other hand, memory-based approaches usually do not scale due to the capacity limit of single machines. In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graph management and computing. Our experiments show Trinity's performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"24 1","pages":"505-516"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78223575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 461
WOW: what the world of (data) warehousing can learn from the World of Warcraft 《魔兽世界》:数据仓库领域可以从《魔兽世界》中学到什么
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465267
René Müller, T. Kaldewey, G. Lohman, J. McPherson
{"title":"WOW: what the world of (data) warehousing can learn from the World of Warcraft","authors":"René Müller, T. Kaldewey, G. Lohman, J. McPherson","doi":"10.1145/2463676.2465267","DOIUrl":"https://doi.org/10.1145/2463676.2465267","url":null,"abstract":"Although originally designed to accelerate pixel monsters, graphics Processing Units (GPUs) have been used for some time as accelerators for selected data base operations. However, to the best of our knowledge, no one has yet reported building a complete system that allows executing complex analytics queries, much less an entire data warehouse benchmark at realistic scale. In this demo, we showcase such a complete system prototype running on a high-end GPU paired with an IBM storage system that achieves >90% hardware efficiency. Our solution delivers sustainable high throughput for business analytics queries in a realistic scenario, i.e., the Star Schema Benchmark at scale factor 1,000. Attendees can interact with our system through a graphical user interface on a tablet PC. They will be able to experience first hand how queries that require processing more than six billion rows, or 100 GB of data, are answered in less than 20 seconds. The user interface allows submitting queries, live performance monitoring of the current query all the way down to the operator level, and viewing the result once the query completes.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"584 1","pages":"961-964"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78933682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bolt-on causal consistency 附加因果一致性
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465279
Peter D. Bailis, A. Ghodsi, J. Hellerstein, I. Stoica
{"title":"Bolt-on causal consistency","authors":"Peter D. Bailis, A. Ghodsi, J. Hellerstein, I. Stoica","doi":"10.1145/2463676.2465279","DOIUrl":"https://doi.org/10.1145/2463676.2465279","url":null,"abstract":"We consider the problem of separating consistency-related safety properties from availability and durability in distributed data stores via the application of a \"bolt-on\" shim layer that upgrades the safety of an underlying general-purpose data store. This shim provides the same consistency guarantees atop a wide range of widely deployed but often inflexible stores. As causal consistency is one of the strongest consistency models that remain available during system partitions, we develop a shim layer that upgrades eventually consistent stores to provide convergent causal consistency. Accordingly, we leverage widely deployed eventually consistent infrastructure as a common substrate for providing causal guarantees. We describe algorithms and shim implementations that are suitable for a large class of application-level causality relationships and evaluate our techniques using an existing, production-ready data store and with real-world explicit causality relationships.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"106 1","pages":"761-772"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79519614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 228
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信