Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data最新文献_第3页

Efficient Route Planning on Public Transportation Networks: A Labelling Approach 公共交通网络的有效路线规划:一种标签方法

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2749456

Sibo Wang, Wenqing Lin, Yi Yang, Xiaokui Xiao, Shuigeng Zhou

{"title":"Efficient Route Planning on Public Transportation Networks: A Labelling Approach","authors":"Sibo Wang, Wenqing Lin, Yi Yang, Xiaokui Xiao, Shuigeng Zhou","doi":"10.1145/2723372.2749456","DOIUrl":"https://doi.org/10.1145/2723372.2749456","url":null,"abstract":"A public transportation network can often be modeled as a timetable graph where (i) each node represents a station; and (ii) each directed edge (u,v) is associated with a timetable that records the departure (resp. arrival) time of each vehicle at station u (resp. v). Several techniques have been proposed for various types of route planning on timetable graphs, e.g., retrieving the route from a node to another with the shortest travel time. These techniques, however, either provide insufficient query efficiency or incur significant space overheads. This paper presents Timetable Labelling (TTL), an efficient indexing technique for route planning on timetable graphs. The basic idea of TTL is to associate each node $u$ with a set of labels, each of which records the shortest travel time from u to some other node v given a certain departure time from u; such labels would then be used during query processing to improve efficiency. In addition, we propose query algorithms that enable TTL to support three popular types of route planning queries, and investigate how we reduce the space consumption of TTL with advanced preprocessing and label compression methods. By conducting an extensive set of experiments on real world datasets, we demonstrate that TTL significantly outperforms the states of the art in terms of query efficiency, while incurring moderate preprocessing and space overheads.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128696858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 96

D2WORM: A Management Infrastructure for Distributed Data-centric Workflows D2WORM:分布式数据中心工作流的管理基础设施

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2735362

Martin Jergler, Mohammad Sadoghi, H. Jacobsen

{"title":"D2WORM: A Management Infrastructure for Distributed Data-centric Workflows","authors":"Martin Jergler, Mohammad Sadoghi, H. Jacobsen","doi":"10.1145/2723372.2735362","DOIUrl":"https://doi.org/10.1145/2723372.2735362","url":null,"abstract":"Unlike traditional activity-flow-based models, data-centric workflows primarily focus on the data to drive a business. This enables the unification of operational management, concurrent process analytics, compliance with process or associated data constraints, and adaptability to changing environments. In this demonstration, we present D2Worm, a Distributed Data-centric Workflow Management system. D2Worm allows users to (1) graphically model data-centric workflows in a declarative fashion based on the Guard-Stage-Milestone (GSM) meta-model, (2) automatically compile the modelled workflow into several fine-granular workflow units (WFUs), and (3) deploy these WFUs on distributed infrastructures. A WFU is a system component that manages a subset of the workflow's data model and, at the same time, represents part of the global control flow by evaluating conditions over the data. WFUs communicate with each other over a publish/subscribe messaging infrastructure that allows the architecture to scale from a single node to dozens of machines distributed over different data-centers. In addition, D2Worm is able to (4) concurrently execute multiple workflow instances and monitor their behavior in real-time.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129559970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? 机器学习和数据库:未来之声还是炒作的杂音?

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2742911

C. Ré, D. Agrawal, M. Balazinska, Michael J. Cafarella, Michael I. Jordan, Tim Kraska, R. Ramakrishnan

引用次数: 24

ShareInsights: An Unified Approach to Full-stack Data Processing ShareInsights:全栈数据处理的统一方法

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2742800

Mukund Deshpande

{"title":"ShareInsights: An Unified Approach to Full-stack Data Processing","authors":"Mukund Deshpande","doi":"10.1145/2723372.2742800","DOIUrl":"https://doi.org/10.1145/2723372.2742800","url":null,"abstract":"The field of data analysis seeks to extract value from data for either business or scientific benefit. This field has seen a renewed interest with the advent of big data technologies and a new organizational role called data scientist. Even with the new found focus, the task of analyzing large amounts of data is still challenging and time-consuming. The essence of data analysis involves setting up data pipe-lines which consists of several operations that are chained together - starting from data collection, data quality checks, data integration, data analysis and data visualization (including the setting up of interaction paths in that visualization). In our opinion, the challenges stem from from the technology diversity at each stage of the data pipeline as well as the lack of process around the analysis. In this paper we present a platform that aims to significantly reduce the time it takes to build data pipelines. The platform attempts to achieve this in following ways. Allow the user to describe the entire data pipeline with a single language and idioms - all the way from data ingestion to insight expression (via visualization and end-user interaction). Provide a rich library of parts that allow users to quickly assemble a data analysis pipeline in the language. Allow for a collaboration model that allows multiple users to work together on a data analysis pipeline as well as leverage and extend prior work with minimal effort. We studied the efficacy of the platform for a data hackathon competition conducted in our organization. The hackathon provided us with a way to study the impact of the approach. Rich data pipelines which traditionally took weeks to build were constructed and deployed in hours. Consequently, we believe that the complexity of designing and running the data analysis pipeline can be significantly reduced; leading to a marked improvement in the productivity of data analysts/data scientists.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127905997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Efficient Enumeration of Maximal k-Plexes 极大k-丛的有效枚举

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2746478

D. Berlowitz, Sara Cohen, B. Kimelfeld

{"title":"Efficient Enumeration of Maximal k-Plexes","authors":"D. Berlowitz, Sara Cohen, B. Kimelfeld","doi":"10.1145/2723372.2746478","DOIUrl":"https://doi.org/10.1145/2723372.2746478","url":null,"abstract":"The problem of enumerating (i.e., generating) all maximal cliques in a graph has received extensive treatment, due to the plethora of applications in various areas such as data mining, bioinformatics, network analysis and community detection. However, requiring the enumerated subgraphs to be full cliques is too restrictive in common real-life scenarios where \"almost cliques\" are equally useful. Hence, the notion of a k-plex, a clique relaxation that allows every node to be \"missing\" k neighbors, has been introduced. But this seemingly minor relaxation casts existing algorithms for clique enumeration inapplicable, for inherent reasons. This paper presents the first provably efficient algorithms, both for enumerating the maximal k-plexes and for enumerating the maximal connected k-plexes. Our algorithms run in polynomial delay for a constant k and incremental FPT delay when k is a parameter. The importance of such algorithms is in the areas mentioned above, as well as in new applications. Extensive experimentation over both real and synthetic datasets shows the efficiency of our algorithms, and their scalability with respect to graph size, density and choice of k, as well as their clear superiority over the state-of-the-art.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126227248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Skew-Aware Join Optimization for Array Databases 面向数组数据库的倾斜感知连接优化

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2723709

Jennie Duggan, Olga Papaemmanouil, L. Battle, M. Stonebraker

{"title":"Skew-Aware Join Optimization for Array Databases","authors":"Jennie Duggan, Olga Papaemmanouil, L. Battle, M. Stonebraker","doi":"10.1145/2723372.2723709","DOIUrl":"https://doi.org/10.1145/2723372.2723709","url":null,"abstract":"Science applications are accumulating an ever-increasing amount of multidimensional data. Although some of it can be processed in a relational database, much of it is better suited to array-based engines. As such, it is important to optimize the query processing of these systems. This paper focuses on efficient query processing of join operations within an array database. These engines invariably ``chunk'' their data into multidimensional tiles that they use to efficiently process spatial queries. As such, traditional relational algorithms need to be substantially modified to take advantage of array tiles. Moreover, most n-dimensional science data is unevenly distributed in array space because its underlying observations rarely follow a uniform pattern. It is crucial that the optimization of array joins be skew-aware. In addition, owing to the scale of science applications, their query processing usually spans multiple nodes. This further complicates the planning of array joins. In this paper, we introduce a join optimization framework that is skew-aware for distributed joins. This optimization consists of two phases. In the first, a logical planner selects the query's algorithm (e.g., merge join), the granularity of the its tiles, and the reorganization operations needed to align the data. The second phase implements this logical plan by assigning tiles to cluster nodes using an analytical cost model. Our experimental results, on both synthetic and real-world data, demonstrate that this optimization framework speeds up array joins by up to 2.5X in comparison to the baseline.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122317395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

SQLGraph: An Efficient Relational-Based Property Graph Store 一个高效的基于关系的属性图存储

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2723732

Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, G. Xie

引用次数: 101

Knowledge Curation and Knowledge Fusion: Challenges, Models and Applications 知识管理与知识融合:挑战、模型与应用

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2731083

X. Dong, D. Srivastava

{"title":"Knowledge Curation and Knowledge Fusion: Challenges, Models and Applications","authors":"X. Dong, D. Srivastava","doi":"10.1145/2723372.2731083","DOIUrl":"https://doi.org/10.1145/2723372.2731083","url":null,"abstract":"Large-scale knowledge repositories are becoming increasingly important as a foundation for enabling a wide variety of complex applications. In turn, building high-quality knowledge repositories critically depends on the technologies of knowledge curation and knowledge fusion, which share many similar goals with data integration, while facing even more challenges in extracting knowledge from both structured and unstructured data, across a large variety of domains, and in multiple languages. Our tutorial highlights the similarities and differences between knowledge management and data integration, and has two goals. First, we introduce the Database community to the techniques proposed for the problems of entity linkage and relation extraction by the Knowledge Management, Natural Language Processing, and Machine Learning communities. Second, we give a detailed survey of the work done by these communities in knowledge fusion, which is critical to discover and clean errors present in sources and the many mistakes made in the process of knowledge extraction from sources. Our tutorial is example driven and hopes to build bridges between the Database community and other disciplines to advance research in this important area.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

FTT: A System for Finding and Tracking Tourists in Public Transport Services FTT:在公共交通服务中寻找和跟踪游客的系统

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2735367

Huayu Wu, Jo-Anne Tan, W. Ng, Mingqiang Xue, Wei Chen

{"title":"FTT: A System for Finding and Tracking Tourists in Public Transport Services","authors":"Huayu Wu, Jo-Anne Tan, W. Ng, Mingqiang Xue, Wei Chen","doi":"10.1145/2723372.2735367","DOIUrl":"https://doi.org/10.1145/2723372.2735367","url":null,"abstract":"The tourism industry is a key economic driver for many cities. To understand tourists' traveling patterns can help both public and private relevant sectors design and improve their services to serve tourists better and get additional values from it. The existing approaches to discover tourists' traveling pattern focus on small sets of known tourists extracted from social media or other channels. The accuracy of the mining result cannot be guaranteed due to the small and bias set of samples. In this paper, we present our system FTT (Finding and Tracking Tourists) to identify tourists from public transport commuters in a city, and to further track their movements from one place to another. Our target is a large set of tourists and their trajectories extracted from public transport riding records, which more accurately represent the movements of general tourists. In particular, we design an iterative learning algorithm to find the tourists among public transport commuters, and provide interface to answer user queries on tourists' traveling patterns. The result will be visualized on top of a city map.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125268940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Chiaroscuro: Transparency and Privacy for Massive Personal Time-Series Clustering 明暗对比:大规模个人时间序列聚类的透明性和隐私性

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2749453

T. Allard, G. Hébrail, F. Masseglia, Esther Pacitti

引用次数: 26