Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data最新文献_第9页

Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems 让我们谈谈非易失性存储器数据库系统的存储和恢复方法

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2749441

Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor

{"title":"Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems","authors":"Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor","doi":"10.1145/2723372.2749441","DOIUrl":"https://doi.org/10.1145/2723372.2749441","url":null,"abstract":"The advent of non-volatile memory (NVM) will fundamentally change the dichotomy between memory and durable storage in database management systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes to it are potentially persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data intensive applications. To better understand these issues, we implemented three engines in a modular DBMS testbed that are based on different storage management architectures: (1) in-place updates, (2) copy-on-write updates, and (3) log-structured updates. We then present NVM-aware variants of these architectures that leverage the persistence and byte-addressability properties of NVM in their storage and recovery methods. Our experimental evaluation on an NVM hardware emulator shows that these engines achieve up to 5.5X higher throughput than their traditional counterparts while reducing the amount of wear due to write operations by up to 2X. We also demonstrate that our NVM-aware recovery protocols allow these engines to recover almost instantaneously after the DBMS restarts.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 206

Smooth Task Migration in Apache Storm Apache Storm中的平滑任务迁移

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2764941

Mansheng Yang, Richard T. B. Ma

引用次数: 19

Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity 基于索引的最大连通性Steiner组件计算最优算法

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2746486

Lijun Chang, Xuemin Lin, Lu Qin, J. Yu, W. Zhang

{"title":"Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity","authors":"Lijun Chang, Xuemin Lin, Lu Qin, J. Yu, W. Zhang","doi":"10.1145/2723372.2746486","DOIUrl":"https://doi.org/10.1145/2723372.2746486","url":null,"abstract":"With the proliferation of graph applications, the problem of efficiently computing all $k$-edge connected components of a graph G for a user-given k has been recently investigated. In this paper, we study the problem of efficiently computing the steiner component with the maximum connectivity; that is, given a set q of query vertices in a graph G, we aim to find the maximum induced subgraph g of G such that g contains q and g has the maximum connectivity, where g is denoted as SMCC. To accommodate online query processing, we present an efficient algorithm based on a novel index such that the algorithm runs in linear time regarding the result size; thus, the algorithm is optimal since it needs at least linear time to output the result. Moreover, in this paper we also investigate variations of the above problem. We show that such a problem with the constraint that the size of the SMCC is not smaller than a given size can also be solved in linear time regarding the result size (thus, optimal). We also show that the problem of computing the connectivity (rather than the graph details) of SMCC can be solved in linear time regarding the query size (thus, optimal). To build the index, we extend the techniques in [7] to accommodate batch processing and computation sharing. To efficiently support the applications with graph updates, we also present novel increment techniques. Finally, we conduct extensive performance studies on large real and synthetic graphs, which demonstrate that our index-based algorithms significantly outperform baseline algorithms by several orders of magnitude and our indexing algorithms are efficient.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122657347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

StoryPivot: Comparing and Contrasting Story Evolution StoryPivot:比较和对比故事的发展

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2735356

Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, D. Srivastava

{"title":"StoryPivot: Comparing and Contrasting Story Evolution","authors":"Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, D. Srivastava","doi":"10.1145/2723372.2735356","DOIUrl":"https://doi.org/10.1145/2723372.2735356","url":null,"abstract":"As the world evolves around us, so does the digital coverage of it. Events of diverse types, associated with different actors and various locations, are continuously captured by multiple information sources such as news articles, blogs, social media etc. day by day. In the digital world, these events are represented through information snippets that contain information on the involved entities, a description of the event, when the event occurred, etc. In our work, we observe that events (and their corresponding digital representations) are often inter-connected, i.e., they form stories which represent evolving relationships between events over time. Take as an example the plane crash in Ukraine in July 2014 which involved multiple entities such as \"Ukraine\", \"Malaysia\", and \"Russia\" and multiple events ranging from the actual crash to the incident investigation and the presentation of the investigator's findings. In this demonstration we present StoryPivot, a framework that helps its users to detect evolving stories in event datasets over time. To resolve stories, we differentiate between story identification, the problem of connecting events over time within a source, and story alignment, the problem of integrating stories across sources. The goal of this demonstration is to present an interactive exploration of both these problems and how events can be dynamically interpreted and put into context in real-world datasets.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129414206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SCREEN: Stream Data Cleaning under Speed Constraints SCREEN:在速度限制下的流数据清理

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2723730

Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu

{"title":"SCREEN: Stream Data Cleaning under Speed Constraints","authors":"Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu","doi":"10.1145/2723372.2723730","DOIUrl":"https://doi.org/10.1145/2723372.2723730","url":null,"abstract":"Stream data are often dirty, for example, owing to unreliable sensor reading, or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum change principle in data cleaning. To capture the knowledge about what is clean, we consider the (widely existing) constraints on the speed of data changes, such as fuel consumption per hour, or daily limit of stock prices. Guided by these semantic constraints, in this paper, we propose SCREEN, the first constraint-based approach for cleaning stream data. It is notable that existing data repair techniques clean (a sequence of) data as a whole and fail to support stream computation. To this end, we have to relax the global optimum over the entire sequence to the local optimum in a window. Rather than the commonly observed NP-hardness of general data repairing problems, our major contributions include: (1) polynomial time algorithm for global optimum, (2) linear time algorithm towards local optimum under an efficient Median Principle,(3) support on out-of-order arrivals of data points, and(4) adaptive window size for balancing repair accuracy and efficiency. Experiments on real datasets demonstrate that SCREEN can show significantly higher repair accuracy than the existing approaches such as smoothing.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128736271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Graft: A Debugging Tool For Apache Giraph Graft: Apache Giraph的调试工具

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2735353

S. Salihoglu, Jaeho Shin, V. Khanna, Ba Quan Truong, J. Widom

引用次数: 23

STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data 大型时空数据的时空在线推理与管理

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2735373

Robert Christensen, Lu Wang, Feifei Li, K. Yi, Jun Tang, Natalee Villa

引用次数: 14

GetReal: Towards Realistic Selection of Influence Maximization Strategies in Competitive Networks 竞争网络中影响力最大化策略的现实选择

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2723710

Hui Li, S. Bhowmick, Jiangtao Cui, Yunjun Gao, Jianfeng Ma

{"title":"GetReal: Towards Realistic Selection of Influence Maximization Strategies in Competitive Networks","authors":"Hui Li, S. Bhowmick, Jiangtao Cui, Yunjun Gao, Jianfeng Ma","doi":"10.1145/2723372.2723710","DOIUrl":"https://doi.org/10.1145/2723372.2723710","url":null,"abstract":"State-of-the-art classical influence maximization (IM) techniques are \"competition-unaware\" as they assume that a group (company) finds seeds (users) in a network independent of other groups who are also simultaneously interested in finding such seeds in the same network. However, in reality several groups often compete for the same market (e.g., Samsung, HTC, and Apple for the smart phone market) and hence may attempt to select seeds in the same network. This has led to increasing body of research in devising IM techniques for competitive networks. Despite the considerable progress made by these efforts toward finding seeds in a more realistic settings, unfortunately, they still make several unrealistic assumptions (e.g., a new company being aware of a rival's strategy, alternate seed selection, etc.) making their deployment impractical in real-world networks. In this paper, we propose a novel framework based on game theory to provide a more realistic solution to the IM problem in competitive networks by jettisoning these unrealistic assumptions. Specifically, we seek to find the \"best\" IM strategy (an algorithm or a mixture of algorithms) a group should adopt in the presence of rivals so that it can maximize its influence. As each group adopts some strategy, we model the problem as a game with each group as competitors and the expected influences under the strategies as payoffs. We propose a novel algorithm called GetReal to find each group's best solution by leveraging the competition between different groups. Specifically, it seeks to find whether there exist a Nash Equilibrium (NE) in a game, which guarantees that there exist an \"optimal\" strategy for each group. Our experimental study on real-world networks demonstrates the superiority of our solution in a more realistic environment.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128647791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

k-Shape: Efficient and Accurate Clustering of Time Series k-Shape:高效准确的时间序列聚类

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2737793

John Paparrizos, L. Gravano

{"title":"k-Shape: Efficient and Accurate Clustering of Time Series","authors":"John Paparrizos, L. Gravano","doi":"10.1145/2723372.2737793","DOIUrl":"https://doi.org/10.1145/2723372.2737793","url":null,"abstract":"The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we present k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. To demonstrate the robustness of k-Shape, we perform an extensive experimental evaluation of our approach against partitional, hierarchical, and spectral clustering methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable approaches in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable (and hence impractical) combinations, with one exception that achieves similar accuracy results. However, unlike k-Shape, this combination requires tuning of its distance measure and is two orders of magnitude slower than k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and highly efficient clustering approach for time series with broad applications.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121163443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 265

A Secure Search Engine for the Personal Cloud 面向个人云的安全搜索引擎

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data Pub Date : 2015-05-27 DOI: 10.1145/2723372.2735376

Saliha Lallali, N. Anciaux, I. S. Popa, P. Pucheral

{"title":"A Secure Search Engine for the Personal Cloud","authors":"Saliha Lallali, N. Anciaux, I. S. Popa, P. Pucheral","doi":"10.1145/2723372.2735376","DOIUrl":"https://doi.org/10.1145/2723372.2735376","url":null,"abstract":"The emerging Personal Could paradigm holds the promise of a Privacy-by-Design storage and computing platform where personal data remain under the individual's control while being shared by valuable applications. However, leaving the data management control to user's hands pushes the security issues to the user's platform. This demonstration presents a Secure Personal Cloud Platform relying on a query and access control engine embedded in a tamper resistant hardware device connected to the user's platform. The main difficulty lies in the design of an inverted document index and its related search and update algorithms capable of tackling the strong hardware constraints of these devices. We have implemented our engine on a real tamper resistant hardware device and present its capacity to regulate the access to a personal dataspace. The objective of this demonstration is to show (1) that secure hardware is a key enabler of the Personal Cloud paradigm and (2) that new embedded indexing and querying techniques can tackle the hardware constraints of tamper-resistant devices and provide scalable solutions for the Personal Cloud.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121140205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14