Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
Efficient Route Planning on Public Transportation Networks: A Labelling Approach 公共交通网络的有效路线规划:一种标签方法
Sibo Wang, Wenqing Lin, Yi Yang, Xiaokui Xiao, Shuigeng Zhou
{"title":"Efficient Route Planning on Public Transportation Networks: A Labelling Approach","authors":"Sibo Wang, Wenqing Lin, Yi Yang, Xiaokui Xiao, Shuigeng Zhou","doi":"10.1145/2723372.2749456","DOIUrl":"https://doi.org/10.1145/2723372.2749456","url":null,"abstract":"A public transportation network can often be modeled as a timetable graph where (i) each node represents a station; and (ii) each directed edge (u,v) is associated with a timetable that records the departure (resp. arrival) time of each vehicle at station u (resp. v). Several techniques have been proposed for various types of route planning on timetable graphs, e.g., retrieving the route from a node to another with the shortest travel time. These techniques, however, either provide insufficient query efficiency or incur significant space overheads. This paper presents Timetable Labelling (TTL), an efficient indexing technique for route planning on timetable graphs. The basic idea of TTL is to associate each node $u$ with a set of labels, each of which records the shortest travel time from u to some other node v given a certain departure time from u; such labels would then be used during query processing to improve efficiency. In addition, we propose query algorithms that enable TTL to support three popular types of route planning queries, and investigate how we reduce the space consumption of TTL with advanced preprocessing and label compression methods. By conducting an extensive set of experiments on real world datasets, we demonstrate that TTL significantly outperforms the states of the art in terms of query efficiency, while incurring moderate preprocessing and space overheads.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128696858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
D2WORM: A Management Infrastructure for Distributed Data-centric Workflows D2WORM:分布式数据中心工作流的管理基础设施
Martin Jergler, Mohammad Sadoghi, H. Jacobsen
{"title":"D2WORM: A Management Infrastructure for Distributed Data-centric Workflows","authors":"Martin Jergler, Mohammad Sadoghi, H. Jacobsen","doi":"10.1145/2723372.2735362","DOIUrl":"https://doi.org/10.1145/2723372.2735362","url":null,"abstract":"Unlike traditional activity-flow-based models, data-centric workflows primarily focus on the data to drive a business. This enables the unification of operational management, concurrent process analytics, compliance with process or associated data constraints, and adaptability to changing environments. In this demonstration, we present D2Worm, a Distributed Data-centric Workflow Management system. D2Worm allows users to (1) graphically model data-centric workflows in a declarative fashion based on the Guard-Stage-Milestone (GSM) meta-model, (2) automatically compile the modelled workflow into several fine-granular workflow units (WFUs), and (3) deploy these WFUs on distributed infrastructures. A WFU is a system component that manages a subset of the workflow's data model and, at the same time, represents part of the global control flow by evaluating conditions over the data. WFUs communicate with each other over a publish/subscribe messaging infrastructure that allows the architecture to scale from a single node to dozens of machines distributed over different data-centers. In addition, D2Worm is able to (4) concurrently execute multiple workflow instances and monitor their behavior in real-time.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129559970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? 机器学习和数据库:未来之声还是炒作的杂音?
C. Ré, D. Agrawal, M. Balazinska, Michael J. Cafarella, Michael I. Jordan, Tim Kraska, R. Ramakrishnan
{"title":"Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype?","authors":"C. Ré, D. Agrawal, M. Balazinska, Michael J. Cafarella, Michael I. Jordan, Tim Kraska, R. Ramakrishnan","doi":"10.1145/2723372.2742911","DOIUrl":"https://doi.org/10.1145/2723372.2742911","url":null,"abstract":"Machine learning seems to be eating the world with a new breed of high-value data-driven applications in image analysis, search, voice recognition, mobile, and office productivity products. To paraphrase Mike Stonebraker, machine learning is no longer a zero-billion-dollar business. As the home of high-value, data-driven applications for over four decades, a natural question for database researchers to ask is: what role should the database community play in these new data-driven machine-learning-based applications?","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129837618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
ShareInsights: An Unified Approach to Full-stack Data Processing ShareInsights:全栈数据处理的统一方法
Mukund Deshpande
{"title":"ShareInsights: An Unified Approach to Full-stack Data Processing","authors":"Mukund Deshpande","doi":"10.1145/2723372.2742800","DOIUrl":"https://doi.org/10.1145/2723372.2742800","url":null,"abstract":"The field of data analysis seeks to extract value from data for either business or scientific benefit. This field has seen a renewed interest with the advent of big data technologies and a new organizational role called data scientist. Even with the new found focus, the task of analyzing large amounts of data is still challenging and time-consuming. The essence of data analysis involves setting up data pipe-lines which consists of several operations that are chained together - starting from data collection, data quality checks, data integration, data analysis and data visualization (including the setting up of interaction paths in that visualization). In our opinion, the challenges stem from from the technology diversity at each stage of the data pipeline as well as the lack of process around the analysis. In this paper we present a platform that aims to significantly reduce the time it takes to build data pipelines. The platform attempts to achieve this in following ways. Allow the user to describe the entire data pipeline with a single language and idioms - all the way from data ingestion to insight expression (via visualization and end-user interaction). Provide a rich library of parts that allow users to quickly assemble a data analysis pipeline in the language. Allow for a collaboration model that allows multiple users to work together on a data analysis pipeline as well as leverage and extend prior work with minimal effort. We studied the efficacy of the platform for a data hackathon competition conducted in our organization. The hackathon provided us with a way to study the impact of the approach. Rich data pipelines which traditionally took weeks to build were constructed and deployed in hours. Consequently, we believe that the complexity of designing and running the data analysis pipeline can be significantly reduced; leading to a marked improvement in the productivity of data analysts/data scientists.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127905997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient Enumeration of Maximal k-Plexes 极大k-丛的有效枚举
D. Berlowitz, Sara Cohen, B. Kimelfeld
{"title":"Efficient Enumeration of Maximal k-Plexes","authors":"D. Berlowitz, Sara Cohen, B. Kimelfeld","doi":"10.1145/2723372.2746478","DOIUrl":"https://doi.org/10.1145/2723372.2746478","url":null,"abstract":"The problem of enumerating (i.e., generating) all maximal cliques in a graph has received extensive treatment, due to the plethora of applications in various areas such as data mining, bioinformatics, network analysis and community detection. However, requiring the enumerated subgraphs to be full cliques is too restrictive in common real-life scenarios where \"almost cliques\" are equally useful. Hence, the notion of a k-plex, a clique relaxation that allows every node to be \"missing\" k neighbors, has been introduced. But this seemingly minor relaxation casts existing algorithms for clique enumeration inapplicable, for inherent reasons. This paper presents the first provably efficient algorithms, both for enumerating the maximal k-plexes and for enumerating the maximal connected k-plexes. Our algorithms run in polynomial delay for a constant k and incremental FPT delay when k is a parameter. The importance of such algorithms is in the areas mentioned above, as well as in new applications. Extensive experimentation over both real and synthetic datasets shows the efficiency of our algorithms, and their scalability with respect to graph size, density and choice of k, as well as their clear superiority over the state-of-the-art.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126227248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Skew-Aware Join Optimization for Array Databases 面向数组数据库的倾斜感知连接优化
Jennie Duggan, Olga Papaemmanouil, L. Battle, M. Stonebraker
{"title":"Skew-Aware Join Optimization for Array Databases","authors":"Jennie Duggan, Olga Papaemmanouil, L. Battle, M. Stonebraker","doi":"10.1145/2723372.2723709","DOIUrl":"https://doi.org/10.1145/2723372.2723709","url":null,"abstract":"Science applications are accumulating an ever-increasing amount of multidimensional data. Although some of it can be processed in a relational database, much of it is better suited to array-based engines. As such, it is important to optimize the query processing of these systems. This paper focuses on efficient query processing of join operations within an array database. These engines invariably ``chunk'' their data into multidimensional tiles that they use to efficiently process spatial queries. As such, traditional relational algorithms need to be substantially modified to take advantage of array tiles. Moreover, most n-dimensional science data is unevenly distributed in array space because its underlying observations rarely follow a uniform pattern. It is crucial that the optimization of array joins be skew-aware. In addition, owing to the scale of science applications, their query processing usually spans multiple nodes. This further complicates the planning of array joins. In this paper, we introduce a join optimization framework that is skew-aware for distributed joins. This optimization consists of two phases. In the first, a logical planner selects the query's algorithm (e.g., merge join), the granularity of the its tiles, and the reorganization operations needed to align the data. The second phase implements this logical plan by assigning tiles to cluster nodes using an analytical cost model. Our experimental results, on both synthetic and real-world data, demonstrate that this optimization framework speeds up array joins by up to 2.5X in comparison to the baseline.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122317395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
SQLGraph: An Efficient Relational-Based Property Graph Store 一个高效的基于关系的属性图存储
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, G. Xie
{"title":"SQLGraph: An Efficient Relational-Based Property Graph Store","authors":"Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, G. Xie","doi":"10.1145/2723372.2723732","DOIUrl":"https://doi.org/10.1145/2723372.2723732","url":null,"abstract":"We show that existing mature, relational optimizers can be exploited with a novel schema to give better performance for property graph storage and retrieval than popular noSQL graph stores. The schema combines relational storage for adjacency information with JSON storage for vertex and edge attributes. We demonstrate that this particular schema design has benefits compared to a purely relational or purely JSON solution. The query translation mechanism translates Gremlin queries with no side effects into SQL queries so that one can leverage relational query optimizers. We also conduct an empirical evaluation of our schema design and query translation mechanism with two existing popular property graph stores. We show that our system is 2-8 times better on query performance, and 10-30 times better in throughput on 4.3 billion edge graphs compared to existing stores.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114062249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Knowledge Curation and Knowledge Fusion: Challenges, Models and Applications 知识管理与知识融合:挑战、模型与应用
X. Dong, D. Srivastava
{"title":"Knowledge Curation and Knowledge Fusion: Challenges, Models and Applications","authors":"X. Dong, D. Srivastava","doi":"10.1145/2723372.2731083","DOIUrl":"https://doi.org/10.1145/2723372.2731083","url":null,"abstract":"Large-scale knowledge repositories are becoming increasingly important as a foundation for enabling a wide variety of complex applications. In turn, building high-quality knowledge repositories critically depends on the technologies of knowledge curation and knowledge fusion, which share many similar goals with data integration, while facing even more challenges in extracting knowledge from both structured and unstructured data, across a large variety of domains, and in multiple languages. Our tutorial highlights the similarities and differences between knowledge management and data integration, and has two goals. First, we introduce the Database community to the techniques proposed for the problems of entity linkage and relation extraction by the Knowledge Management, Natural Language Processing, and Machine Learning communities. Second, we give a detailed survey of the work done by these communities in knowledge fusion, which is critical to discover and clean errors present in sources and the many mistakes made in the process of knowledge extraction from sources. Our tutorial is example driven and hopes to build bridges between the Database community and other disciplines to advance research in this important area.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
FTT: A System for Finding and Tracking Tourists in Public Transport Services FTT:在公共交通服务中寻找和跟踪游客的系统
Huayu Wu, Jo-Anne Tan, W. Ng, Mingqiang Xue, Wei Chen
{"title":"FTT: A System for Finding and Tracking Tourists in Public Transport Services","authors":"Huayu Wu, Jo-Anne Tan, W. Ng, Mingqiang Xue, Wei Chen","doi":"10.1145/2723372.2735367","DOIUrl":"https://doi.org/10.1145/2723372.2735367","url":null,"abstract":"The tourism industry is a key economic driver for many cities. To understand tourists' traveling patterns can help both public and private relevant sectors design and improve their services to serve tourists better and get additional values from it. The existing approaches to discover tourists' traveling pattern focus on small sets of known tourists extracted from social media or other channels. The accuracy of the mining result cannot be guaranteed due to the small and bias set of samples. In this paper, we present our system FTT (Finding and Tracking Tourists) to identify tourists from public transport commuters in a city, and to further track their movements from one place to another. Our target is a large set of tourists and their trajectories extracted from public transport riding records, which more accurately represent the movements of general tourists. In particular, we design an iterative learning algorithm to find the tourists among public transport commuters, and provide interface to answer user queries on tourists' traveling patterns. The result will be visualized on top of a city map.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125268940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Chiaroscuro: Transparency and Privacy for Massive Personal Time-Series Clustering 明暗对比:大规模个人时间序列聚类的透明性和隐私性
T. Allard, G. Hébrail, F. Masseglia, Esther Pacitti
{"title":"Chiaroscuro: Transparency and Privacy for Massive Personal Time-Series Clustering","authors":"T. Allard, G. Hébrail, F. Masseglia, Esther Pacitti","doi":"10.1145/2723372.2749453","DOIUrl":"https://doi.org/10.1145/2723372.2749453","url":null,"abstract":"The advent of on-body/at-home sensors connected to personal devices leads to the generation of fine grain highly sensitive personal data at an unprecendent rate. However, despite the promises of large scale analytics there are obvious privacy concerns that prevent individuals to share their personnal data. In this paper, we propose Chiaroscuro, a complete solution for clustering personal data with strong privacy guarantees. The execution sequence produced by Chiaroscuro is massively distributed on personal devices, coping with arbitrary connections and disconnections. Chiaroscuro builds on our novel data structure, called Diptych, which allows the participating devices to collaborate privately by combining encryption with differential privacy. Our solution yields a high clustering quality while minimizing the impact of the differentially private perturbation. Chiaroscuro is both correct and secure. Finally, we provide an experimental validation of our approach on both real and synthetic sets of time-series.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131162077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信