ACM SIGMOD Record最新文献

筛选
英文 中文
Auto-Tables: Relationalize Tables without Using Examples 自动表格无需使用示例即可建立关系表
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665269
Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri
{"title":"Auto-Tables: Relationalize Tables without Using Examples","authors":"Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri","doi":"10.1145/3665252.3665269","DOIUrl":"https://doi.org/10.1145/3665252.3665269","url":null,"abstract":"Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables \"in the wild\". Our survey of real spreadsheettables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Tableau forums.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140980095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Binary Join to Free Join 从二进制加盟到免费加盟
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665259
Y. Wang, Max Willsey, Dan Suciu
{"title":"From Binary Join to Free Join","authors":"Y. Wang, Max Willsey, Dan Suciu","doi":"10.1145/3665252.3665259","DOIUrl":"https://doi.org/10.1145/3665252.3665259","url":null,"abstract":"Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional binary joins, all the while remaining simple to understand and implement. However, they have been found to be less efficient than the old paradigm, traditional binary join plans, on the typical acyclic queries found in practice. In an effort to unify and generalize the two paradigms, we proposed a new framework, called Free Join, in our SIGMOD 2023 paper. Not only does Free Join unite the worlds of traditional and worst-case optimal join algorithms, it uncovers optimizations and evaluation strategies that outperform both.\u0000 In this article, we approach Free Join from the traditional perspective of binary joins, and re-derive the more general framework via a series of gradual transformations. We hope this perspective from the past can help practitioners better understand the Free Join framework, and find ways to incorporate some of the ideas into their own systems.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140981851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective: Efficient and Reusable Lazy Sampling 技术视角:高效、可重复使用的懒惰采样
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665260
Thomas Neumann
{"title":"Technical Perspective: Efficient and Reusable Lazy Sampling","authors":"Thomas Neumann","doi":"10.1145/3665252.3665260","DOIUrl":"https://doi.org/10.1145/3665252.3665260","url":null,"abstract":"When interactively working with data, query latency is very important. In particular when ad-hoc queries are written in an explorative manner, it is essential to quickly get feedback in order to refine and correct the query based upon result values. This interactive use case is difficult to support if the underlying data is large, as analyzing large volumes of data is inherently expensive.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140978326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and Reusable Lazy Sampling 高效、可重复使用的懒人取样
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665261
Viktor Sanca, Periklis Chrysogelos, Anastasia Ailamaki
{"title":"Efficient and Reusable Lazy Sampling","authors":"Viktor Sanca, Periklis Chrysogelos, Anastasia Ailamaki","doi":"10.1145/3665252.3665261","DOIUrl":"https://doi.org/10.1145/3665252.3665261","url":null,"abstract":"Modern analytical engines rely on Approximate Query Processing (AQP) to provide faster response times than the hardware allows for exact query answering. However, existing AQP methods impose steep performance penalties as workload unpredictability increases. While offline AQP relies on predictable workloads to a priori create samples that match the queries, as soon as workload predictability diminishes, returning to existing online AQP methods that create query-specific samples with little reuse across queries results in significantly smaller gains in response times. As a result, existing approaches cannot fully exploit the benefits of sampling under increased unpredictability.\u0000 We propose LAQy, a framework for building, expanding, and merging samples to adapt to the changes in workload predicates. We propose lazy sampling to overcome the unpredictability issues that cause fast-but-specialized samples to be query-specific and design it for a scale-up analytical engine to show the adaptivity and practicality of our framework in a modern system. LAQy speeds up online sampling processing as a function of data access and computation reuse, making sampler placement after expensive operators more practical.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140979380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBSP: Incremental Computation on Streams and Its Applications to Databases DBSP:流上的增量计算及其在数据库中的应用
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665271
Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, V. Tannen
{"title":"DBSP: Incremental Computation on Streams and Its Applications to Databases","authors":"Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, V. Tannen","doi":"10.1145/3665252.3665271","DOIUrl":"https://doi.org/10.1145/3665252.3665271","url":null,"abstract":"We describe DBSP, a framework for incremental computation. Incremental computations repeatedly evaluate a function on some input values that are \"changing\". The goal of an efficient implementation is to \"reuse\" previously computed results. Ideally, when presented with a new change to the input, an incremental computation should only perform work proportional to the size of the changes of the input, rather than to the size of the entire dataset.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140978919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective: Synthetic Data Needs a Reproducibility Benchmark 技术视角:合成数据需要可重复性基准
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665266
Xi He
{"title":"Technical Perspective: Synthetic Data Needs a Reproducibility Benchmark","authors":"Xi He","doi":"10.1145/3665252.3665266","DOIUrl":"https://doi.org/10.1145/3665252.3665266","url":null,"abstract":"Synthetic data is a vital substitute for real sensitive personal data in supporting social science research and policy studies. Extensive prior research has delved into various models for generating synthetic data, from traditional statistical approaches to cutting-edge deep-learning methods. However, selecting the most suitable one for unforeseen applications poses a significant challenge due to the varying strengths and weaknesses, dependent on factors such as the application domain, data distribution, analytical requirements, and privacy considerations.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140979949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective on 'Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch' 关于 "使用米斯拉-格里斯草图获得更好的差分私有近似直方图和重击 "的技术视角
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665254
Graham Cormode
{"title":"Technical Perspective on 'Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch'","authors":"Graham Cormode","doi":"10.1145/3665252.3665254","DOIUrl":"https://doi.org/10.1145/3665252.3665254","url":null,"abstract":"The topics of private data analysis and streaming data management have both been separately the focus of much study within the data management community for many years. However, more recently there have been studies which bring these two previously isolated topics together.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140980862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unicorn: A Unified Multi-Tasking Matching Model 独角兽统一的多任务匹配模型
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665263
Ju Fan, Jianhong Tu, Guoliang Li, Peng Wang, Xiaoyong Du, Xiaofeng Jia, Song Gao, Nan Tang
{"title":"Unicorn: A Unified Multi-Tasking Matching Model","authors":"Ju Fan, Jianhong Tu, Guoliang Li, Peng Wang, Xiaoyong Du, Xiaofeng Jia, Song Gao, Nan Tang","doi":"10.1145/3665252.3665263","DOIUrl":"https://doi.org/10.1145/3665252.3665263","url":null,"abstract":"Data matching, which decides whether two data elements (e.g., string, tuple, column, or knowledge graph entity) are the \"same\" (a.k.a. a match), is a key concept in data integration. The widely used practice is to build task-specific or even dataset-specific solutions, which are hard to generalize and disable the opportunities of knowledge sharing that can be learned from different datasets and multiple tasks. In this paper, we propose Unicorn, a unified model for generally supporting common data matching tasks. Building such a unified model is challenging due to heterogeneous formats of input data elements and various matching semantics of multiple tasks. To address the challenges, Unicorn employs one generic Encoder that converts any pair of data elements (a, b) into a learned representation, and uses a Matcher, which is a binary classifier, to decide whether a matches b. To align matching semantics of multiple tasks, Unicorn adopts a mixture-of-experts model that enhances the learned representation into a better representation. We conduct extensive experiments using 20 datasets on 7 well-studied data matching tasks, and find that our unified model can achieve better performance on most tasks and on average, compared with the state-of-the-art specific models trained for ad-hoc tasks and datasets separately. Moreover, Unicorn can also well serve new matching tasks with zero-shot learning.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140978783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective: Graph Theory for Data Privacy: A New Approach for Complex Data Flows 技术视角:数据隐私的图论:复杂数据流的新方法
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665264
Elena Ferrari
{"title":"Technical Perspective: Graph Theory for Data Privacy: A New Approach for Complex Data Flows","authors":"Elena Ferrari","doi":"10.1145/3665252.3665264","DOIUrl":"https://doi.org/10.1145/3665252.3665264","url":null,"abstract":"Nearly all of the world's population now uses online services that request personal information, covering almost every aspect of our lives. The abundance of personal data in digital form has brought incredible benefits to end users, enabling them to access personalized and advanced services based on the analysis of the data collected. This capability has dramatically improved the user experience in various application domains, ranging from healthcare to e-commerce, finance, logistics, and entertainment, to name a few. Numerous technological advancements in the field of big data have enabled this massive processing of personal data, and recent advances in AI data processing capabilities will expand the ways in which service providers will use personal data in the coming years. Machine learning algorithms, powered by AI, will be used to make increasingly accurate predictions about user behavior by uncovering hidden correlations within massive data sets. There is therefore a tension between the desire to fully exploit personal data in such ecosystems and the need to provide strong privacy and transparency guarantees to the individuals whose data is being exploited. Privacy protection is further complicated because data processing is typically not performed in isolation but through pipelines of different services, with each step making inferences about the personal data consumed by the services in subsequent steps.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140980059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Restructure Tables Automatically 学会自动重组表格
ACM SIGMOD Record Pub Date : 2024-05-14 DOI: 10.1145/3665252.3665268
J. M. Hellerstein
{"title":"Learning to Restructure Tables Automatically","authors":"J. M. Hellerstein","doi":"10.1145/3665252.3665268","DOIUrl":"https://doi.org/10.1145/3665252.3665268","url":null,"abstract":"By now, it is widely-accepted folk wisdom that \"half of the time in any data analysis project is spent wrangling the data\". Analytic algorithms and tools-built on mathematical foundations of matrices and relations-require their data to be lined up in particular rows and columns. In the relational model (known in data science circles as \"tidy data\"), each row is an independent observation, and each column is a distinct attribute of the phenomenon described by the data. While there are many thorny aspects to data wrangling, perhaps none is more basic than the challenge of getting data reorganized, positionally, into the right form for analysis.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140978864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信