ACM SIGMOD Record最新文献

筛选
英文 中文
DFI: The Data Flow Interface for High-Speed Networks 高速网络的数据流接口
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542705
Lasse Thostrup, Jan Skrzypczak, Matthias Jasny, Tobias Ziegler, Carsten Binnig
{"title":"DFI: The Data Flow Interface for High-Speed Networks","authors":"Lasse Thostrup, Jan Skrzypczak, Matthias Jasny, Tobias Ziegler, Carsten Binnig","doi":"10.1145/3542700.3542705","DOIUrl":"https://doi.org/10.1145/3542700.3542705","url":null,"abstract":"In this paper, we propose the Data Flow Interface (DFI) as a way to make it easier for data processing systems to exploit high-speed networks without the need to deal with the complexity of RDMA. By lifting the level of abstraction, DFI factors out much of the complexity of network communication and makes it easier for developers to declaratively express how data should be efficiently routed to accomplish a given distributed data processing task. As we show in our experiments, DFI is able to support a wide variety of data-centric applications with high performance at a low complexity for the applications.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134088379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Technical Perspective 技术的角度来看
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542712
N. Mamoulis
{"title":"Technical Perspective","authors":"N. Mamoulis","doi":"10.1145/3542700.3542712","DOIUrl":"https://doi.org/10.1145/3542700.3542712","url":null,"abstract":"The optimal assignment problem is a classic combinatorial optimization problem. Given a set of n agents A, a set T of m tasks, and an n×m cost matrix C, the objective is to find the matching between A and T, which minimizes or maximizes an aggregate cost of the assigned agent-task pairs. In its standard definition, n = m and we are looking for the 1-to-1 matching with the minimum total cost. From a graph theory perspective, this is a weighted bipartite graph matching problem. A classic algorithm for solving the assignment problem is the Hungarian algorithm (a.k.a. Kuhn-Munkres algorithm) [3], which bears a O(n3) computational cost (assuming that n = m); this is the best run-time of any strongly polynomial algorithm for this problem. There are many variants of the assignment problem, which differ in the optimization objective (i.e., minimize/maximize an aggregate cost, achieve a stable matching, maximize the number of agents matched which their top preferences, etc.) and in whether there are constraints on the number of matches for each agent or task.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125337254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective 技术的角度来看
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542714
Bill Howe
{"title":"Technical Perspective","authors":"Bill Howe","doi":"10.1145/3542700.3542714","DOIUrl":"https://doi.org/10.1145/3542700.3542714","url":null,"abstract":"There is a tension between an imperative style for control flow that has been shown to be easier to use, especially for novices, and a functional style for control flow that better exposes optimization opportunities, thereby making the optimizers more capable. The authors of \"Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance\" propose Mitos, a program rewriting framework that achieves the best of both worlds by borrowing program analysis concepts from compilers and lifting them to the distributed dataflow regime. Dataflow systems require significant data movement during processing, which can be highly redundant and wasteful in the context of iteration: naive execution plans can reprocess the same massive dataset on each iteration, and iteration i+1 must wait until iteration i is finished. The authors design a mechanism for labeling each intermediate result with its execution path, allowing the system to simultaneously manage complex branching situations while also implementing efficient processing via loop pipelining, all by reasoning about and comparing execution paths.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127810548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure and Complexity of Bag Consistency 袋一致性的结构和复杂性
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542719
Albert Atserias, Phokion G. Kolaitis
{"title":"Structure and Complexity of Bag Consistency","authors":"Albert Atserias, Phokion G. Kolaitis","doi":"10.1145/3542700.3542719","DOIUrl":"https://doi.org/10.1145/3542700.3542719","url":null,"abstract":"Since the early days of relational databases, it was realized that acyclic hypergraphs give rise to database schemas with desirable structural and algorithmic properties. In a bynow classical paper, Beeri, Fagin, Maier, and Yannakakis established several different equivalent characterizations of acyclicity; in particular, they showed that the sets of attributes of a schema form an acyclic hypergraph if and only if the local-to-global consistency property for relations over that schema holds, which means that every collection of pairwise consistent relations over the schema is globally consistent. Even though real-life databases consist of bags (multisets), there has not been a study of the interplay between local consistency and global consistency for bags. We embark on such a study here and we first show that the sets of attributes of a schema form an acyclic hypergraph if and only if the local-to-global consistency property for bags over that schema holds. After this, we explore algorithmic aspects of global consistency for bags by analyzing the computational complexity of the global consistency problem for bags: given a collection of bags, are these bags globally consistent? We show that this problem is in NP, even when the schema is part of the input. We then establish the following dichotomy theorem for fixed schemas: if the schema is acyclic, then the global consistency problem for bags is solvable in polynomial time, while if the schema is cyclic, then the global consistency problem for bags is NP-complete. The latter result contrasts sharply with the state of affairs for relations, where, for each fixed schema, the global consistency problem for relations is solvable in polynomial time.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131920708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FoundationDB
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542707
Jingyu Zhou, Meng Xu, A. Shraer, B. Namasivayam, A. Miller, Evan Tschannen, Steve Atherton, Andrew J. Beamon, Rusty Sears, J. Leach, D. Rosenthal, Xin Dong, Willie B. Wilson, Ben Collins, David Scherer, Alec Grieser, Young Liu, Alvin Moore, Bhaskar Muppana, Xi-sheng Su, Vishesh Yadav
{"title":"FoundationDB","authors":"Jingyu Zhou, Meng Xu, A. Shraer, B. Namasivayam, A. Miller, Evan Tschannen, Steve Atherton, Andrew J. Beamon, Rusty Sears, J. Leach, D. Rosenthal, Xin Dong, Willie B. Wilson, Ben Collins, David Scherer, Alec Grieser, Young Liu, Alvin Moore, Bhaskar Muppana, Xi-sheng Su, Vishesh Yadav","doi":"10.1145/3542700.3542707","DOIUrl":"https://doi.org/10.1145/3542700.3542707","url":null,"abstract":"FoundationDB is an open source transactional key value store created more than ten years ago. It is one of the first systems to combine the flexibility and scalability of NoSQL architectures with the power of ACID transactions. FoundationDB adopts an unbundled architecture that decouples an in-memory transaction management system, a distributed storage system, and a built-in distributed configuration system. Each sub-system can be independently provisioned and configured to achieve scalability, high-availability and fault tolerance. FoundationDB includes a deterministic simulation framework, used to test every new feature under a myriad of possible faults. FoundationDB offers a minimal and carefully chosen feature set, which has enabled a range of disparate systems to be built as layers on top. FoundationDB is the underpinning of cloud infrastructure at Apple, Snowflake and other companies.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131052821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No PANE, No Gain 不付出就没有收获
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542711
Renchi Yang, Jieming Shi, X. Xiao, Yin Yang, S. Bhowmick, Juncheng Liu
{"title":"No PANE, No Gain","authors":"Renchi Yang, Jieming Shi, X. Xiao, Yin Yang, S. Bhowmick, Juncheng Liu","doi":"10.1145/3542700.3542711","DOIUrl":"https://doi.org/10.1145/3542700.3542711","url":null,"abstract":"Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v 2 G to a compact vector Xv, which can be used in downstream machine learning tasks in a variety of applications. Existing ANE solutions do not scale to massive graphs due to prohibitive computation costs or generation of low-quality embeddings. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs in a single server that achieves state-of-the-art result quality on multiple benchmark datasets for two common prediction tasks: link prediction and node classification. Under the hood, PANE takes inspiration from well-established data management techniques to scale up ANE in a single server. Specifically, it exploits a carefully formulated problem based on a novel random walk model, a highly efficient solver, and non-trivial parallelization by utilizing modern multi-core CPUs. Extensive experiments demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122573140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Technical Perspective 技术的角度来看
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542716
R. Pagh
{"title":"Technical Perspective","authors":"R. Pagh","doi":"10.1145/3542700.3542716","DOIUrl":"https://doi.org/10.1145/3542700.3542716","url":null,"abstract":"The paper Relative Error Streaming Quantiles, by Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler and Pavel Vesel´y studies a fundamental question in data stream processing, namely how to maintain information about the distribution of data in the form of quantiles. More precisely, given a stream S of elements from some ordered universe U we wish to maintain a compact summary data structure that allows us to estimate the number of elements in the stream that are smaller than a given query element y 2 U, i.e., estimate the rank of y. Solutions to this problem have numerous applications in large-scale data analysis and can potentially be used for range query selectivity estimation in database engines.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133185570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective of TURL TURL的技术视角
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542708
Paolo Papotti
{"title":"Technical Perspective of TURL","authors":"Paolo Papotti","doi":"10.1145/3542700.3542708","DOIUrl":"https://doi.org/10.1145/3542700.3542708","url":null,"abstract":"Several efforts aim at representing tabular data with neural models for supporting target applications at the intersection of natural language processing (NLP) and databases (DB) [1-3]. The goal is to extend to structured data the recent neural architectures, which achieve state of the art results in NLP applications. Language models (LMs) are usually pre-trained with unsupervised tasks on a large text corpus. The output LM is then fine-tuned on a variety of downstream tasks with a small set of specific examples. This process has many advantages, because the LM contains information about textual structure and content, which are used by the target application without manually defining features.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121134687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bipartite Matching 双方的匹配
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542713
Tenindra Abeywickrama, Victor Liang, K. Tan
{"title":"Bipartite Matching","authors":"Tenindra Abeywickrama, Victor Liang, K. Tan","doi":"10.1145/3542700.3542713","DOIUrl":"https://doi.org/10.1145/3542700.3542713","url":null,"abstract":"The Kuhn-Munkres (KM) algorithm is a classical combinatorial optimization algorithm that is widely used for minimum cost bipartite matching in many real-world applications, such as transportation. For example, a ride-hailing service may use it to find the optimal assignment of drivers to passengers to minimize the overall wait time. Typically, given two bipartite sets, this process involves computing the edge costs between all bipartite pairs and finding an optimal matching. However, existing works overlook the impact of edge cost computation on the overall running time. In reality, edge computation often significantly outweighs the computation of the optimal assignment itself, as in the case of assigning drivers to passengers which involves computation of expensive graph shortest paths. Following on from this, we also observe common real-world settings exhibit a useful property that allows us to incrementally compute edge costs only as required using an inexpensive lower-bound heuristic. This technique significantly reduces the overall cost of assignment compared to the original KM algorithm, as we demonstrate experimentally on multiple real-world data sets and workloads. Moreover, our algorithm is not limited to this domain and is potentially applicable in other settings where lower-bounding heuristics are available.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128662681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Imperative or Functional Control Flow Handling 命令式或功能性控制流处理
ACM SIGMOD Record Pub Date : 2022-05-31 DOI: 10.1145/3542700.3542715
G. Gévay, T. Rabl, S. Breß, Lorand Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz, V. Markl
{"title":"Imperative or Functional Control Flow Handling","authors":"G. Gévay, T. Rabl, S. Breß, Lorand Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz, V. Markl","doi":"10.1145/3542700.3542715","DOIUrl":"https://doi.org/10.1145/3542700.3542715","url":null,"abstract":"Modern data analysis tasks often involve control flow statements, such as the iterations in PageRank and K-means. To achieve scalability, developers usually implement these tasks in distributed dataflow systems, such as Spark and Flink. Designers of such systems have to choose between providing imperative or functional control flow constructs to users. Imperative constructs are easier to use, but functional constructs are easier to compile to an efficient dataflow job. We propose Mitos, a system where control flow is both easy to use and efficient. Mitos relies on an intermediate representation based on the static single assignment form. This allows us to abstract away from specific control flow constructs and treat any imperative control flow uniformly both when building the dataflow job and when coordinating the distributed execution.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125784090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信