{"title":"Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems","authors":"Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor","doi":"10.1145/2723372.2749441","DOIUrl":"https://doi.org/10.1145/2723372.2749441","url":null,"abstract":"The advent of non-volatile memory (NVM) will fundamentally change the dichotomy between memory and durable storage in database management systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes to it are potentially persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data intensive applications. To better understand these issues, we implemented three engines in a modular DBMS testbed that are based on different storage management architectures: (1) in-place updates, (2) copy-on-write updates, and (3) log-structured updates. We then present NVM-aware variants of these architectures that leverage the persistence and byte-addressability properties of NVM in their storage and recovery methods. Our experimental evaluation on an NVM hardware emulator shows that these engines achieve up to 5.5X higher throughput than their traditional counterparts while reducing the amount of wear due to write operations by up to 2X. We also demonstrate that our NVM-aware recovery protocols allow these engines to recover almost instantaneously after the DBMS restarts.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smooth Task Migration in Apache Storm","authors":"Mansheng Yang, Richard T. B. Ma","doi":"10.1145/2723372.2764941","DOIUrl":"https://doi.org/10.1145/2723372.2764941","url":null,"abstract":"Task migration happens when distributed data processing systems scale in real-time. To handle the task migration process more gracefully, we propose three task migration methods: (i) worker level migration, (ii) executor level migration, and (iii) executor level migration with reliable messaging. We implement our migration methods on Apache Storm. Our experiments show that, compared with Storm's original migration implementation, our methods significantly reduce the performance degradation and the number of task failures during each migration.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"388 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity","authors":"Lijun Chang, Xuemin Lin, Lu Qin, J. Yu, W. Zhang","doi":"10.1145/2723372.2746486","DOIUrl":"https://doi.org/10.1145/2723372.2746486","url":null,"abstract":"With the proliferation of graph applications, the problem of efficiently computing all $k$-edge connected components of a graph G for a user-given k has been recently investigated. In this paper, we study the problem of efficiently computing the steiner component with the maximum connectivity; that is, given a set q of query vertices in a graph G, we aim to find the maximum induced subgraph g of G such that g contains q and g has the maximum connectivity, where g is denoted as SMCC. To accommodate online query processing, we present an efficient algorithm based on a novel index such that the algorithm runs in linear time regarding the result size; thus, the algorithm is optimal since it needs at least linear time to output the result. Moreover, in this paper we also investigate variations of the above problem. We show that such a problem with the constraint that the size of the SMCC is not smaller than a given size can also be solved in linear time regarding the result size (thus, optimal). We also show that the problem of computing the connectivity (rather than the graph details) of SMCC can be solved in linear time regarding the query size (thus, optimal). To build the index, we extend the techniques in [7] to accommodate batch processing and computation sharing. To efficiently support the applications with graph updates, we also present novel increment techniques. Finally, we conduct extensive performance studies on large real and synthetic graphs, which demonstrate that our index-based algorithms significantly outperform baseline algorithms by several orders of magnitude and our indexing algorithms are efficient.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122657347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, D. Srivastava
{"title":"StoryPivot: Comparing and Contrasting Story Evolution","authors":"Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, D. Srivastava","doi":"10.1145/2723372.2735356","DOIUrl":"https://doi.org/10.1145/2723372.2735356","url":null,"abstract":"As the world evolves around us, so does the digital coverage of it. Events of diverse types, associated with different actors and various locations, are continuously captured by multiple information sources such as news articles, blogs, social media etc. day by day. In the digital world, these events are represented through information snippets that contain information on the involved entities, a description of the event, when the event occurred, etc. In our work, we observe that events (and their corresponding digital representations) are often inter-connected, i.e., they form stories which represent evolving relationships between events over time. Take as an example the plane crash in Ukraine in July 2014 which involved multiple entities such as \"Ukraine\", \"Malaysia\", and \"Russia\" and multiple events ranging from the actual crash to the incident investigation and the presentation of the investigator's findings. In this demonstration we present StoryPivot, a framework that helps its users to detect evolving stories in event datasets over time. To resolve stories, we differentiate between story identification, the problem of connecting events over time within a source, and story alignment, the problem of integrating stories across sources. The goal of this demonstration is to present an interactive exploration of both these problems and how events can be dynamically interpreted and put into context in real-world datasets.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129414206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu
{"title":"SCREEN: Stream Data Cleaning under Speed Constraints","authors":"Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu","doi":"10.1145/2723372.2723730","DOIUrl":"https://doi.org/10.1145/2723372.2723730","url":null,"abstract":"Stream data are often dirty, for example, owing to unreliable sensor reading, or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum change principle in data cleaning. To capture the knowledge about what is clean, we consider the (widely existing) constraints on the speed of data changes, such as fuel consumption per hour, or daily limit of stock prices. Guided by these semantic constraints, in this paper, we propose SCREEN, the first constraint-based approach for cleaning stream data. It is notable that existing data repair techniques clean (a sequence of) data as a whole and fail to support stream computation. To this end, we have to relax the global optimum over the entire sequence to the local optimum in a window. Rather than the commonly observed NP-hardness of general data repairing problems, our major contributions include: (1) polynomial time algorithm for global optimum, (2) linear time algorithm towards local optimum under an efficient Median Principle,(3) support on out-of-order arrivals of data points, and(4) adaptive window size for balancing repair accuracy and efficiency. Experiments on real datasets demonstrate that SCREEN can show significantly higher repair accuracy than the existing approaches such as smoothing.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128736271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Salihoglu, Jaeho Shin, V. Khanna, Ba Quan Truong, J. Widom
{"title":"Graft: A Debugging Tool For Apache Giraph","authors":"S. Salihoglu, Jaeho Shin, V. Khanna, Ba Quan Truong, J. Widom","doi":"10.1145/2723372.2735353","DOIUrl":"https://doi.org/10.1145/2723372.2735353","url":null,"abstract":"We address the problem of debugging programs written for Pregel-like systems. After interviewing Giraph and GPS users, we developed Graft. Graft supports the debugging cycle that users typically go through: (1) Users describe programmatically the set of vertices they are interested in inspecting. During execution, Graft captures the context information of these vertices across supersteps. (2) Using Graft's GUI, users visualize how the values and messages of the captured vertices change from superstep to superstep,narrowing in suspicious vertices and supersteps. (3) Users replay the exact lines of the code vertex.compute() function that executed for the suspicious vertices and supersteps, by copying code that Graft generates into their development environments' line-by-line debuggers. Graft also has features to construct end-to-end tests for Giraph programs. Graft is open-source and fully integrated into Apache Giraph's main code base.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130045731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Christensen, Lu Wang, Feifei Li, K. Yi, Jun Tang, Natalee Villa
{"title":"STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data","authors":"Robert Christensen, Lu Wang, Feifei Li, K. Yi, Jun Tang, Natalee Villa","doi":"10.1145/2723372.2735373","DOIUrl":"https://doi.org/10.1145/2723372.2735373","url":null,"abstract":"We present the STORM system to enable spatio-temporal online reasoning and management of large spatio-temporal data. STORM supports interactive spatio-temporal analytics through novel spatial online sampling techniques. Online spatio-temporal aggregation and analytics are then derived based on the online samples, where approximate answers with approximation quality guarantees can be provided immediately from the start of query execution. The quality of these online approximations improve over time. This demonstration proposal describes key ideas in the design of the STORM system, and presents the demonstration plan.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"460 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127531226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Li, S. Bhowmick, Jiangtao Cui, Yunjun Gao, Jianfeng Ma
{"title":"GetReal: Towards Realistic Selection of Influence Maximization Strategies in Competitive Networks","authors":"Hui Li, S. Bhowmick, Jiangtao Cui, Yunjun Gao, Jianfeng Ma","doi":"10.1145/2723372.2723710","DOIUrl":"https://doi.org/10.1145/2723372.2723710","url":null,"abstract":"State-of-the-art classical influence maximization (IM) techniques are \"competition-unaware\" as they assume that a group (company) finds seeds (users) in a network independent of other groups who are also simultaneously interested in finding such seeds in the same network. However, in reality several groups often compete for the same market (e.g., Samsung, HTC, and Apple for the smart phone market) and hence may attempt to select seeds in the same network. This has led to increasing body of research in devising IM techniques for competitive networks. Despite the considerable progress made by these efforts toward finding seeds in a more realistic settings, unfortunately, they still make several unrealistic assumptions (e.g., a new company being aware of a rival's strategy, alternate seed selection, etc.) making their deployment impractical in real-world networks. In this paper, we propose a novel framework based on game theory to provide a more realistic solution to the IM problem in competitive networks by jettisoning these unrealistic assumptions. Specifically, we seek to find the \"best\" IM strategy (an algorithm or a mixture of algorithms) a group should adopt in the presence of rivals so that it can maximize its influence. As each group adopts some strategy, we model the problem as a game with each group as competitors and the expected influences under the strategies as payoffs. We propose a novel algorithm called GetReal to find each group's best solution by leveraging the competition between different groups. Specifically, it seeks to find whether there exist a Nash Equilibrium (NE) in a game, which guarantees that there exist an \"optimal\" strategy for each group. Our experimental study on real-world networks demonstrates the superiority of our solution in a more realistic environment.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128647791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"k-Shape: Efficient and Accurate Clustering of Time Series","authors":"John Paparrizos, L. Gravano","doi":"10.1145/2723372.2737793","DOIUrl":"https://doi.org/10.1145/2723372.2737793","url":null,"abstract":"The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we present k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. To demonstrate the robustness of k-Shape, we perform an extensive experimental evaluation of our approach against partitional, hierarchical, and spectral clustering methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable approaches in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable (and hence impractical) combinations, with one exception that achieves similar accuracy results. However, unlike k-Shape, this combination requires tuning of its distance measure and is two orders of magnitude slower than k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and highly efficient clustering approach for time series with broad applications.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121163443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saliha Lallali, N. Anciaux, I. S. Popa, P. Pucheral
{"title":"A Secure Search Engine for the Personal Cloud","authors":"Saliha Lallali, N. Anciaux, I. S. Popa, P. Pucheral","doi":"10.1145/2723372.2735376","DOIUrl":"https://doi.org/10.1145/2723372.2735376","url":null,"abstract":"The emerging Personal Could paradigm holds the promise of a Privacy-by-Design storage and computing platform where personal data remain under the individual's control while being shared by valuable applications. However, leaving the data management control to user's hands pushes the security issues to the user's platform. This demonstration presents a Secure Personal Cloud Platform relying on a query and access control engine embedded in a tamper resistant hardware device connected to the user's platform. The main difficulty lies in the design of an inverted document index and its related search and update algorithms capable of tackling the strong hardware constraints of these devices. We have implemented our engine on a real tamper resistant hardware device and present its capacity to regulate the access to a personal dataspace. The objective of this demonstration is to show (1) that secure hardware is a key enabler of the Personal Cloud paradigm and (2) that new embedded indexing and querying techniques can tackle the hardware constraints of tamper-resistant devices and provide scalable solutions for the Personal Cloud.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121140205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}