Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings 高维向量相似搜索:从时间序列到深度网络嵌入
Karima Echihabi
{"title":"High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings","authors":"Karima Echihabi","doi":"10.1145/3318464.3384402","DOIUrl":"https://doi.org/10.1145/3318464.3384402","url":null,"abstract":"Similarity search is an important and challenging problem that is typically modeled as nearest neighbor search in high dimensional space, where objects are represented as high dimensional vectors and their (dis)similarity is evaluated using a distance measure such as the Euclidean distance.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114925250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Beyond Analytics: The Evolution of Stream Processing Systems 超越分析:流处理系统的演变
Paris Carbone, Marios Fragkoulis, Vasiliki Kalavri, Asterios Katsifodimos
{"title":"Beyond Analytics: The Evolution of Stream Processing Systems","authors":"Paris Carbone, Marios Fragkoulis, Vasiliki Kalavri, Asterios Katsifodimos","doi":"10.1145/3318464.3383131","DOIUrl":"https://doi.org/10.1145/3318464.3383131","url":null,"abstract":"Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. The goal of this tutorial is threefold. First, we aim to review and highlight noteworthy past research findings, which were largely ignored until very recently. Second, we intend to underline the differences between early ('00-'10) and modern ('11-'18) streaming systems, and how those systems have evolved through the years. Most importantly, we wish to turn the attention of the database community to recent trends: streaming systems are no longer used only for classic stream processing workloads, namely window aggregates and joins. Instead, modern streaming systems are being increasingly used to deploy general event-driven applications in a scalable fashion, challenging the design decisions, architecture and intended use of existing stream processing systems.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122466026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
vChain: A Blockchain System Ensuring Query Integrity vChain:确保查询完整性的区块链系统
Haixin Wang, Cheng Xu, Ce Zhang, Jianliang Xu
{"title":"vChain: A Blockchain System Ensuring Query Integrity","authors":"Haixin Wang, Cheng Xu, Ce Zhang, Jianliang Xu","doi":"10.1145/3318464.3384682","DOIUrl":"https://doi.org/10.1145/3318464.3384682","url":null,"abstract":"This demonstration presents vChain, a blockchain system that ensures query integrity. With the proliferation of blockchain applications and services, there has been an increasing demand for querying the data stored in a blockchain database. However, existing solutions either are at the risk of losing query integrity, or require users to maintain a full copy of the blockchain database. In comparison, by employing a novel verifiable query processing framework, vChain enables a lightweight user to authenticate the query results returned from a potentially untrusted service provider. We demonstrate its verifiable query operations, usability, and performance with visualization for better insights. We also showcase how users can detect falsified results in the case that the service provider is compromised.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131307768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Data Series Progressive Similarity Search with Probabilistic Quality Guarantees 具有概率质量保证的数据序列渐进式相似性搜索
Anna Gogolou, Theophanis Tsandilas, Karima Echihabi, A. Bezerianos, Themis Palpanas
{"title":"Data Series Progressive Similarity Search with Probabilistic Quality Guarantees","authors":"Anna Gogolou, Theophanis Tsandilas, Karima Echihabi, A. Bezerianos, Themis Palpanas","doi":"10.1145/3318464.3389751","DOIUrl":"https://doi.org/10.1145/3318464.3389751","url":null,"abstract":"Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Experiments with synthetic and diverse real datasets demonstrate that our prediction methods constitute the first practical solution to the problem, significantly outperforming competing approaches.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128384574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Lethe: A Tunable Delete-Aware LSM Engine Lethe:一个可调的删除感知LSM引擎
Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, Manos Athanassoulis
{"title":"Lethe: A Tunable Delete-Aware LSM Engine","authors":"Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, Manos Athanassoulis","doi":"10.1145/3318464.3389757","DOIUrl":"https://doi.org/10.1145/3318464.3389757","url":null,"abstract":"Data-intensive applications fueled the evolution of log structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as a second-class citizen. A delete inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, LSM engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written. We highlight that fast persistent deletion without affecting read performance is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on the right-to-be-forgotten, and (iii) en masse cloud deployment of data systems that makes storage a precious resource. To address these challenges, in this paper, we build a new key-value storage engine, Lethe, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show that Lethe supports any user-defined threshold for the delete persistence latency offering higher read throughput (1.17-1.4x) and lower space amplification (2.1-9.8x), with a modest increase in write amplification (between 4% and 25%). In addition, Lethe supports efficient range deletes on a secondary delete key by dropping entire data pages without sacrificing read performance nor employing a costly full tree merge.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130757551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines Rhino:流处理引擎的超大分布式状态的有效管理
Bonaventura Del Monte, Steffen Zeuch, T. Rabl, V. Markl
{"title":"Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines","authors":"Bonaventura Del Monte, Steffen Zeuch, T. Rabl, V. Markl","doi":"10.1145/3318464.3389723","DOIUrl":"https://doi.org/10.1145/3318464.3389723","url":null,"abstract":"Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of-the-art, and reduces latency by three orders of magnitude upon a reconfiguration.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134401485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data SpeakQL:面向结构化数据的语音驱动多模式查询
Vraj Shah, Side Li, Arun Kumar, L. Saul
{"title":"SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data","authors":"Vraj Shah, Side Li, Arun Kumar, L. Saul","doi":"10.1145/3318464.3389777","DOIUrl":"https://doi.org/10.1145/3318464.3389777","url":null,"abstract":"Speech-driven querying is becoming popular in new device environments such as smartphones, tablets, and even conversational assistants. However, such querying is largely restricted to natural language. Typed SQL remains the gold standard for sophisticated structured querying although it is painful in many environments, which restricts when and how users consume their data. In this work, we propose to bridge this gap by designing a speech-driven querying system and interface for structured data we call SpeakQL. We support a practically useful subset of regular SQL and allow users to query in any domain with novel touch/speech based human-in-the-loop correction mechanisms. Automatic speech recognition (ASR) introduces myriad forms of errors in transcriptions, presenting us with a technical challenge. We exploit our observations of SQL's properties, its grammar, and the queried database to build a modular architecture. We present the first dataset of spoken SQL queries and a generic approach to generate them for any arbitrary schema. Our experiments show that SpeakQL can automatically correct a large fraction of errors in ASR transcriptions. User studies show that SpeakQL can help users specify SQL queries significantly faster with a speedup of average 2.7x and up to 6.7x compared to typing on a tablet device. SpeakQL also reduces the user effort in specifying queries by a factor of average 10x and up to 60x compared to raw typing effort.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130246078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
MC3: A System for Minimization of Classifier Construction Cost MC3:分类器建造成本最小化系统
Shay Gershtein, T. Milo, Gefen Morami, Slava Novgorodov
{"title":"MC3: A System for Minimization of Classifier Construction Cost","authors":"Shay Gershtein, T. Milo, Gefen Morami, Slava Novgorodov","doi":"10.1145/3318464.3384690","DOIUrl":"https://doi.org/10.1145/3318464.3384690","url":null,"abstract":"Search mechanisms over massive sets of items are the cornerstone of many modern applications, particularly in e-commerce websites. Consumers express in search queries a set of properties, and expect the system to retrieve qualifying items. A common difficulty, however, is that the information on whether or not an item satisfies the search criteria is sometimes not explicitly recorded in the repository. Instead, it may be considered as general knowledge or \"hidden\" in a picture/description, thereby leading to incomplete search results. To overcome these problems companies invest in building dedicated classifiers that determine whether an item satisfies the given search criteria. However, building classifiers typically incurs non-trivial costs due to the required volumes of high-quality labeled training data. In this demo, we introduce MC3, a real-time system that helps data analysts decide which classifiers to construct to minimize the costs of answering a set of search queries. MC3 is interactive and facilitates real-time analysis, by providing detailed classifiers impact information. We demonstrate the effectiveness of MC3 on real-world data and scenarios taken from a large e-commerce system, by interacting with the SIGMOD'20 audience members who act as analysts.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128253270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CHASSIS: Conformity Meets Online Information Diffusion 底盘:一致性满足在线信息扩散
Hui Li, Hui Li, S. Bhowmick
{"title":"CHASSIS: Conformity Meets Online Information Diffusion","authors":"Hui Li, Hui Li, S. Bhowmick","doi":"10.1145/3318464.3389780","DOIUrl":"https://doi.org/10.1145/3318464.3389780","url":null,"abstract":"Online information diffusion generates huge volumes of social activities (eg. tweets, retweets posts, comments, likes) among individuals. Existing information diffusion modeling techniques are oblivious to conformity of individuals during the diffusion process, a fundamental human trait according to social psychology theories. Intuitively, conformity captures the extent to which an individual complies with social norms or expectations. In this paper, we present a novel framework called chassis to characterize online information diffusion by bridging classical information diffusion model with conformity from social psychology. To this end, we first extend \"Hawkes Process\", a well-known statistical technique utilized to model information diffusion, to quantitatively capture two flavors of conformity, informational conformity and normative conformity, hidden in activity sequences. Next, we present a novel semi-parametric inference approach to learn the proposed model. Experimental study with real-world datasets demonstrates the superiority of chassis to state-of-the-art conformity-unaware information diffusion models.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116072018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards Scalable UDTFs in Noria 在Noria实现可伸缩udtf
Justus Adam
{"title":"Towards Scalable UDTFs in Noria","authors":"Justus Adam","doi":"10.1145/3318464.3384412","DOIUrl":"https://doi.org/10.1145/3318464.3384412","url":null,"abstract":"User Defined Functions (UDF) are an important and powerful extension point for database queries. Systems using incremental materialized views largely do not support UDFs because they cannot easily be incrementalized. In this work we design single-tuple UDF and User Defined Aggregates (UDA) interfaces for Noria, a state-of-the art dataflow system with incremental materialized views. We also add limited support for User Defined Table Functions (UDTF), by compiling them to query fragments. We show our UDTFs scale by implementing a motivational example used Friedman et al.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122146909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信