Proceedings. ACM-SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
Protecting Data Markets from Strategic Buyers 保护数据市场免受战略买家的影响
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2022-01-01 DOI: 10.1145/3514221.3517855
R. Fernandez
{"title":"Protecting Data Markets from Strategic Buyers","authors":"R. Fernandez","doi":"10.1145/3514221.3517855","DOIUrl":"https://doi.org/10.1145/3514221.3517855","url":null,"abstract":"","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"19 1","pages":"1755-1769"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72847516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XLJoins XLJoins
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2021-01-01 DOI: 10.1145/3448016.3450582
A. Shanghooshabad
{"title":"XLJoins","authors":"A. Shanghooshabad","doi":"10.1145/3448016.3450582","DOIUrl":"https://doi.org/10.1145/3448016.3450582","url":null,"abstract":"Figure 1: An XLJoin example (QX from TPC-H benchmark): Structure learning component receives a join query, metadata, tables and existing models, and builds an MRF graph based on the query then while inferring the JAs (nodes showed in black), a BN is built, and finally, a uniform sample of JAs is generated using Ancestral sampling starting from the root to the leaves. Non-JAs (blue nodes) are added using the MRF once the JAs sampled from the BN because they do not affect uniformity.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"48 1","pages":"2902-2904"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77595776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence of Array DBMS and Cellular Automata: A Road Traffic Simulation Case 阵列DBMS和元胞自动机的收敛:一个道路交通仿真案例
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2021-01-01 DOI: 10.1145/3448016.3458457
R. A. R. Zalipynis
{"title":"Convergence of Array DBMS and Cellular Automata: A Road Traffic Simulation Case","authors":"R. A. R. Zalipynis","doi":"10.1145/3448016.3458457","DOIUrl":"https://doi.org/10.1145/3448016.3458457","url":null,"abstract":"","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"91 1","pages":"2399-2403"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77669625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding Related Tables in Data Lakes for Interactive Data Science. 在交互式数据科学中寻找数据湖中的相关表。
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2020-06-01 DOI: 10.1145/3318464.3389726
Yi Zhang, Zachary G Ives
{"title":"Finding Related Tables in Data Lakes for Interactive Data Science.","authors":"Yi Zhang,&nbsp;Zachary G Ives","doi":"10.1145/3318464.3389726","DOIUrl":"https://doi.org/10.1145/3318464.3389726","url":null,"abstract":"<p><p>Many modern data science applications build on <i>data lakes</i>, schema-agnostic repositories of data files and data products that offer limited organization and management capabilities. There is a need to build data lake search capabilities into data science environments, so scientists and analysts can find tables, schemas, workflows, and datasets useful to their task at hand. We develop search and management solutions for the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables. Our core methods also generalize to other settings where computational tasks involve execution of programs or scripts.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2020 ","pages":"1951-1966"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3318464.3389726","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38553303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Near-Optimal Distributed Band-Joins through Recursive Partitioning. 通过递归分区实现近最优分布式带状连接
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2020-06-01 DOI: 10.1145/3318464.3389750
Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald
{"title":"Near-Optimal Distributed Band-Joins through Recursive Partitioning.","authors":"Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald","doi":"10.1145/3318464.3389750","DOIUrl":"10.1145/3318464.3389750","url":null,"abstract":"<p><p>We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between <i>maximum load per worker</i> and <i>input duplication</i> for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings that were too restricted (resulting in suboptimal join performance). Our main insight is that <i>recursive partitioning of the join-attribute space</i> with the appropriate split scoring measure can achieve both low optimization cost and low join cost. It is the first approach that is not only effective for one-dimensional band-joins but also for joins on multiple attributes. Experiments indicate that our method is able to find partitionings that are within 10% of the <i>lower bound</i> for both maximum load per worker and input duplication for a broad range of settings, significantly improving over previous work.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2020 ","pages":"2375-2390"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872589/pdf/nihms-1666242.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25354876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Join Algorithms Meet Top-k. 最佳连接算法与 Top-k 相结合。
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2020-06-01 DOI: 10.1145/3318464.3383132
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
{"title":"Optimal Join Algorithms Meet Top-<i>k</i>.","authors":"Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald","doi":"10.1145/3318464.3383132","DOIUrl":"10.1145/3318464.3383132","url":null,"abstract":"<p><p><i>Top-k queries</i> have been studied intensively in the database community and they are an important means to reduce query cost when only the \"best\" or \"most interesting\" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on <i>optimal join algorithms</i>, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation. We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-<i>k</i>-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient <i>ranked enumeration of join-query results</i>. These are related to classic work on <i>k</i>-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2020 ","pages":"2659-2665"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7872590/pdf/nihms-1666240.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25354877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances. 超越出处:用基于模式的抗衡解释查询答案
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2019-06-01 DOI: 10.1145/3299869.3300066
Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy
{"title":"Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances.","authors":"Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy","doi":"10.1145/3299869.3300066","DOIUrl":"10.1145/3299869.3300066","url":null,"abstract":"<p><p>Provenance and intervention-based techniques have been used to explain surprisingly high or low outcomes of aggregation queries. However, such techniques may miss interesting explanations emerging from data that is <i>not</i> in the provenance. For instance, an unusually low number of publications of a prolific researcher in a certain venue and year can be explained by an increased number of publications in another venue in the same year. We present a novel approach for explaining outliers in aggregation queries through <i>counter-balancing</i>. That is, explanations are outliers in the opposite direction of the outlier of interest. Outliers are defined w.r.t. patterns that hold over the data in aggregate. We present efficient methods for mining such <i>aggregate regression patterns</i> (<i>ARPs</i>), discuss how to use ARPs to generate and rank explanations, and experimentally demonstrate the efficiency and effectiveness of our approach.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2019 ","pages":"485-502"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980245/pdf/nihms-1030948.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37581491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks. iQCAR:数据分析框架的查询间争用分析器。
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2019-06-01 DOI: 10.1145/3299869.3319904
Prajakta Kalmegh, Shivnath Babu, Sudeepa Roy
{"title":"iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks.","authors":"Prajakta Kalmegh,&nbsp;Shivnath Babu,&nbsp;Sudeepa Roy","doi":"10.1145/3299869.3319904","DOIUrl":"https://doi.org/10.1145/3299869.3319904","url":null,"abstract":"<p><p>Resource interferences caused by concurrent queries is one of the key reasons for unpredictable performance and missed workload SLAs in cluster computing systems. Analyzing these inter-query resource interactions is critical in order to answer time-sensitive questions like 'who is creating resource conflicts to my query'. More importantly, diagnosing whether the resource blocked times of a 'victim' query are caused by other queries or some other external factor can help the database administrator narrow down the many possibilities of query performance degradation. We introduce iQCAR, an inter-Query Contention Analyzer, that attributes blame for the slowdown of a query to concurrent queries. iQCAR models the resource conflicts using a multi-level directed acyclic graph that can help administrators compare impacts from concurrent queries, identify most contentious queries, resources and hosts in an online execution for a selected time window. Our experiments using TPCDS queries on Apache Spark show that our approach is substantially more accurate than other methods based on overlap time between concurrent queries.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2019 ","pages":"918-935"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3299869.3319904","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25578774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
RATest: Explaining Wrong Relational Queries Using Small Examples. RATest:用小例子解释错误的关系查询。
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2019-06-01 DOI: 10.1145/3299869.3320236
Zhengjie Miao, Sudeepa Roy, Jun Yang
{"title":"RATest: Explaining Wrong Relational Queries Using Small Examples.","authors":"Zhengjie Miao,&nbsp;Sudeepa Roy,&nbsp;Jun Yang","doi":"10.1145/3299869.3320236","DOIUrl":"https://doi.org/10.1145/3299869.3320236","url":null,"abstract":"<p><p>We present a system called RATEST, designed to help debug relational queries against reference queries and test database instances. In many applications, e.g., classroom learning and regression testing, we test the correctness of a user query <i>Q</i> by evaluating it over a test database instance <i>D</i> and comparing its result with that of evaluating a reference (correct) query <i>Q</i> <sub>0</sub> over <i>D</i>. If <i>Q</i>(<i>D</i>) differs from <i>Q</i> <sub>0</sub>(<i>D</i>), the user knows <i>Q</i> is incorrect. However, <i>D</i> can be large (often by design), which makes debugging <i>Q</i> difficult. The key idea behind RATEST is to show the user a much smaller database instance <i>D</i>' ⊆ <i>D</i>, which we call a <i>counterexample,</i> such that <i>Q</i>(<i>D</i>') <i>≠ Q</i> <sub>0</sub>(<i>D</i>'). RATEST builds on data provenance and constraint solving, and employs a suite of techniques to support, at interactive speed, complex queries involving differences and group-by aggregation. We demonstrate an application of RATEST in learning: it has been used successfully by a large undergraduate database course in a university to help students with a relational algebra assignment.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2019 ","pages":"1961-1964"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3299869.3320236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41223153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks. iQCAR:集群计算框架内查询争用分析器的演示。
Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2018-06-01 DOI: 10.1145/3183713.3193567
Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy
{"title":"iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks.","authors":"Prajakta Kalmegh,&nbsp;Harrison Lundberg,&nbsp;Frederick Xu,&nbsp;Shivnath Babu,&nbsp;Sudeepa Roy","doi":"10.1145/3183713.3193567","DOIUrl":"https://doi.org/10.1145/3183713.3193567","url":null,"abstract":"<p><p>Unpredictability in query runtimes can arise in a shared cluster as a result of resource contentions caused by inter-query interactions. iQCAR - <i>i</i>nter <b>Q</b>uery <b>C</b>ontention <b>A</b>nalyze<b>R</b> is a system that formally models these interferences between concurrent queries and provides a framework to attribute blame for contentions. iQCAR leverages a multi-level directed acyclic graph called iQC-Graph to diagnose the aberrations in query schedules that lead to these resource contentions. The demonstration will enable users to perform a step-wise deep exploration of such resource contentions faced by a query at various stages of its execution. The interface will allow users to identify top-<i>k</i> victims and sources of contentions, diagnose high-contention nodes and resources in the cluster, and rank their impacts on the performance of a query. Users will also be able to navigate through a set of rules recommended by iQCAR to compare how application of each rule by the cluster scheduler resolves the contentions in subsequent executions.</p>","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"2018 ","pages":"1721-1724"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3183713.3193567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37408129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信