Proceedings of the 2018 International Conference on Management of Data最新文献

筛选
英文 中文
Session details: Research 4: Query Processing 会议详情:研究4:查询处理
H. Pirk
{"title":"Session details: Research 4: Query Processing","authors":"H. Pirk","doi":"10.1145/3258008","DOIUrl":"https://doi.org/10.1145/3258008","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87595786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Processing Algorithms for Influence Maximization 影响最大化的在线处理算法
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183749
Jing Tang, Xueyan Tang, Xiaokui Xiao, Junsong Yuan
{"title":"Online Processing Algorithms for Influence Maximization","authors":"Jing Tang, Xueyan Tang, Xiaokui Xiao, Junsong Yuan","doi":"10.1145/3183713.3183749","DOIUrl":"https://doi.org/10.1145/3183713.3183749","url":null,"abstract":"Influence maximization is a classic and extensively studied problem with important applications in viral marketing. Existing algorithms for influence maximization, however, mostly focus on offline processing, in the sense that they do not provide any output to the user until the final answer is derived, and that the user is not allowed to terminate the algorithm early to trade the quality of solution for efficiency. Such lack of interactiveness and flexibility leads to poor user experience, especially when the algorithm incurs long running time. To address the above problem, this paper studies algorithms for online processing of influence maximization (OPIM), where the user can pause the algorithm at any time and ask for a solution (to the influence maximization problem) and its approximation guarantee, and can resume the algorithm to let it improve the quality of solution by giving it more time to run. (This interactive paradigm is similar in spirit to online query processing in database systems.) We show that the only existing algorithm for OPIM is vastly ineffective in practice, and that adopting existing influence maximization methods for OPIM yields unsatisfactory results. Motivated by this, we propose a new algorithm for OPIM with both superior empirical effectiveness and strong theoretical guarantees, and we show that it can also be extended to handle conventional influence maximization. Extensive experiments on real data demonstrate that our solutions outperform the state of the art for both OPIM and conventional influence maximization.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"279 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86170758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Adaptive Optimization of Very Large Join Queries 超大型连接查询的自适应优化
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183733
Thomas Neumann, Bernhard Radke
{"title":"Adaptive Optimization of Very Large Join Queries","authors":"Thomas Neumann, Bernhard Radke","doi":"10.1145/3183713.3183733","DOIUrl":"https://doi.org/10.1145/3183713.3183733","url":null,"abstract":"The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88569765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Approximate Triangle Count and Clustering Coefficient 近似三角形计数和聚类系数
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183715
Siddharth Bhatia
{"title":"Approximate Triangle Count and Clustering Coefficient","authors":"Siddharth Bhatia","doi":"10.1145/3183713.3183715","DOIUrl":"https://doi.org/10.1145/3183713.3183715","url":null,"abstract":"Two important metrics used to characterise a graph are its triangle count and clustering coefficient. In this paper, we present methods to approximate these metrics for graphs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89382566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Big Data Linkage for Product Specification Pages 产品规格页面的大数据联动
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183757
Disheng Qiu, Luciano Barbosa, Valter Crescenzi, P. Merialdo, D. Srivastava
{"title":"Big Data Linkage for Product Specification Pages","authors":"Disheng Qiu, Luciano Barbosa, Valter Crescenzi, P. Merialdo, D. Srivastava","doi":"10.1145/3183713.3183757","DOIUrl":"https://doi.org/10.1145/3183713.3183757","url":null,"abstract":"An increasing number of product pages are available from thousands of web sources, each page associated with a product, containing its attributes and one or more product identifiers. The sources provide overlapping information about the products, using diverse schemas, making web-scale integration extremely challenging. In this paper, we take advantage of the opportunity that sources publish product identifiers to perform big data linkage across sources at the beginning of the data integration pipeline, before schema alignment. To realize this opportunity, several challenges need to be addressed: identifiers need to be discovered on product pages, made difficult by the diversity of identifiers; the main product identifier on the page needs to be identified, made difficult by the many related products presented on the page; and identifiers across pages need to beresolved, made difficult by the ambiguity between identifiers across product categories. We present our RaF (Redundancy as Friend) solution to the problem of big data linkage for product specification pages, which takes advantage of the redundancy of identifiers at a global level, and the homogeneity of structure and semantics at the local source level, to effectively and efficiently link millions of pages of head and tail products across thousands of head and tail sources. We perform a thorough empirical evaluation of our RaF approach using the publicly available Dexter dataset consisting of 1.9M product pages from 7.1k sources of 3.5k websites, and demonstrate its effectiveness in practice.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"99 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86953641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel 数据按例转换(TDE): Excel中的可扩展数据转换
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193539
Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek R. Narasayya, S. Chaudhuri, Xu Chu, Yudian Zheng
{"title":"Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel","authors":"Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek R. Narasayya, S. Chaudhuri, Xu Chu, Yudian Zheng","doi":"10.1145/3183713.3193539","DOIUrl":"https://doi.org/10.1145/3183713.3193539","url":null,"abstract":"Business analysts and data scientists today increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, phone number, etc., before they can perform analysis. These ad-hoc transformation problems are typically solved by one-off scripts, which is both difficult and time-consuming. Our observation is that these domain-specific transformation problems have long been solved by developers with code libraries, which are often shared in places like GitHub. We thus develop an extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few (typically 3) input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources. The beta version of TDE was released in Office Store for Excel.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90281917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Data Integration and Machine Learning: A Natural Synergy 数据集成和机器学习:自然的协同作用
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.14778/3229863.3229876
X. Dong, Theodoros Rekatsinas
{"title":"Data Integration and Machine Learning: A Natural Synergy","authors":"X. Dong, Theodoros Rekatsinas","doi":"10.14778/3229863.3229876","DOIUrl":"https://doi.org/10.14778/3229863.3229876","url":null,"abstract":"There is now more data to analyze than ever before. As data volume and variety have increased, so have the ties between machine learning and data integration become stronger. For machine learning to be effective, one must utilize data from the greatest possible variety of sources; and this is why data integration plays a key role. At the same time machine learning is driving automation in data integration, resulting in overall reduction of integration costs and improved accuracy. This tutorial focuses on three aspects of the synergistic relationship between data integration and machine learning: (1) we survey how state-of-the-art data integration solutions rely on machine learning-based approaches for accurate results and effective human-in-the-loop pipelines, (2) we review how end-to-end machine learning applications rely on data integration to identify accurate, clean, and relevant data for their analytics exercises, and (3) we discuss open research challenges and opportunities that span across data integration and machine learning.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90449046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications DPaxos:管理数据更接近用户的低延迟和移动应用程序
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196928
Faisal Nawab, D. Agrawal, A. E. Abbadi
{"title":"DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications","authors":"Faisal Nawab, D. Agrawal, A. E. Abbadi","doi":"10.1145/3183713.3196928","DOIUrl":"https://doi.org/10.1145/3183713.3196928","url":null,"abstract":"In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes. DPaxos is intended to implement a State Machine Replication component in data management systems for the edge. DPaxos targets the unique opportunities of utilizing edge computing resources to support emerging applications with stringent mobility and real-time requirements such as Augmented and Virtual Reality and vehicular applications. The main objective of DPaxos is to reduce the latency of serving user requests, recovering from failures, and reacting to mobility. DPaxos achieves these objectives by a few proposed changes to the traditional Paxos protocol. Most notably, DPaxos proposes a dynamic allocation of quorums ( i.e. , groups of nodes) that are needed for Paxos Leader Election. Leader Election quorums in DPaxos are smaller than traditional Paxos and expand only in the presence of conflicts.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76561711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management 特别会议:数据伦理和负责任数据管理的技术研究议程
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3205185
Julia Stoyanovich, Bill Howe, H. Jagadish
{"title":"Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management","authors":"Julia Stoyanovich, Bill Howe, H. Jagadish","doi":"10.1145/3183713.3205185","DOIUrl":"https://doi.org/10.1145/3183713.3205185","url":null,"abstract":"SESSION DESCRIPTION Recently, there has begun a movement towards fairness, accountability, and transparency (FAT) in algorithmic decision making, and in data science more broadly [1–4]. The database community has not been significantly involved in this movement, despite “owning” the models, languages, and systems that produce the input to the machine learning applications that are often the focus in data science. If training data are biased, or have errors, it stands to reason that the algorithmic result will also be unfair or erroneous. Similarly, transparency of just the algorithm is usually insufficient to understand why certain results were obtained: one needs also to know the data used. In short, FAT depend not just on the algorithm, but also on the data. This observation raises several important questions: What are the core data management issues to which the objectives of fairness, accountability and transparency give rise? What role should the database community play in this movement? Will emphasis on these topics dilute our core competency in techniques and technologies for data, or can it reinforce our central role in technology stacks ranging from startups to the enterprise, and from local non-profits to the federal government? This special session features leading researchers from machine learning, software engineering, security and privacy, and natural language processing, who are doing exciting technical work in FAT. The goal of this session is to outline a technical research agenda in data management foundations and systems around data ethics.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78227358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Session details: Industry 1: Adaptive Query Processing 行业1:自适应查询处理
J. Dittrich
{"title":"Session details: Industry 1: Adaptive Query Processing","authors":"J. Dittrich","doi":"10.1145/3258006","DOIUrl":"https://doi.org/10.1145/3258006","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"933 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78394440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信