Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems 让我们谈谈非易失性存储器数据库系统的存储和恢复方法
Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor
{"title":"Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems","authors":"Joy Arulraj, Andrew Pavlo, Subramanya R. Dulloor","doi":"10.1145/2723372.2749441","DOIUrl":"https://doi.org/10.1145/2723372.2749441","url":null,"abstract":"The advent of non-volatile memory (NVM) will fundamentally change the dichotomy between memory and durable storage in database management systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes to it are potentially persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data intensive applications. To better understand these issues, we implemented three engines in a modular DBMS testbed that are based on different storage management architectures: (1) in-place updates, (2) copy-on-write updates, and (3) log-structured updates. We then present NVM-aware variants of these architectures that leverage the persistence and byte-addressability properties of NVM in their storage and recovery methods. Our experimental evaluation on an NVM hardware emulator shows that these engines achieve up to 5.5X higher throughput than their traditional counterparts while reducing the amount of wear due to write operations by up to 2X. We also demonstrate that our NVM-aware recovery protocols allow these engines to recover almost instantaneously after the DBMS restarts.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 206
Smooth Task Migration in Apache Storm Apache Storm中的平滑任务迁移
Mansheng Yang, Richard T. B. Ma
{"title":"Smooth Task Migration in Apache Storm","authors":"Mansheng Yang, Richard T. B. Ma","doi":"10.1145/2723372.2764941","DOIUrl":"https://doi.org/10.1145/2723372.2764941","url":null,"abstract":"Task migration happens when distributed data processing systems scale in real-time. To handle the task migration process more gracefully, we propose three task migration methods: (i) worker level migration, (ii) executor level migration, and (iii) executor level migration with reliable messaging. We implement our migration methods on Apache Storm. Our experiments show that, compared with Storm's original migration implementation, our methods significantly reduce the performance degradation and the number of task failures during each migration.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"388 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity 基于索引的最大连通性Steiner组件计算最优算法
Lijun Chang, Xuemin Lin, Lu Qin, J. Yu, W. Zhang
{"title":"Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity","authors":"Lijun Chang, Xuemin Lin, Lu Qin, J. Yu, W. Zhang","doi":"10.1145/2723372.2746486","DOIUrl":"https://doi.org/10.1145/2723372.2746486","url":null,"abstract":"With the proliferation of graph applications, the problem of efficiently computing all $k$-edge connected components of a graph G for a user-given k has been recently investigated. In this paper, we study the problem of efficiently computing the steiner component with the maximum connectivity; that is, given a set q of query vertices in a graph G, we aim to find the maximum induced subgraph g of G such that g contains q and g has the maximum connectivity, where g is denoted as SMCC. To accommodate online query processing, we present an efficient algorithm based on a novel index such that the algorithm runs in linear time regarding the result size; thus, the algorithm is optimal since it needs at least linear time to output the result. Moreover, in this paper we also investigate variations of the above problem. We show that such a problem with the constraint that the size of the SMCC is not smaller than a given size can also be solved in linear time regarding the result size (thus, optimal). We also show that the problem of computing the connectivity (rather than the graph details) of SMCC can be solved in linear time regarding the query size (thus, optimal). To build the index, we extend the techniques in [7] to accommodate batch processing and computation sharing. To efficiently support the applications with graph updates, we also present novel increment techniques. Finally, we conduct extensive performance studies on large real and synthetic graphs, which demonstrate that our index-based algorithms significantly outperform baseline algorithms by several orders of magnitude and our indexing algorithms are efficient.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122657347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
StoryPivot: Comparing and Contrasting Story Evolution StoryPivot:比较和对比故事的发展
Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, D. Srivastava
{"title":"StoryPivot: Comparing and Contrasting Story Evolution","authors":"Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, D. Srivastava","doi":"10.1145/2723372.2735356","DOIUrl":"https://doi.org/10.1145/2723372.2735356","url":null,"abstract":"As the world evolves around us, so does the digital coverage of it. Events of diverse types, associated with different actors and various locations, are continuously captured by multiple information sources such as news articles, blogs, social media etc. day by day. In the digital world, these events are represented through information snippets that contain information on the involved entities, a description of the event, when the event occurred, etc. In our work, we observe that events (and their corresponding digital representations) are often inter-connected, i.e., they form stories which represent evolving relationships between events over time. Take as an example the plane crash in Ukraine in July 2014 which involved multiple entities such as \"Ukraine\", \"Malaysia\", and \"Russia\" and multiple events ranging from the actual crash to the incident investigation and the presentation of the investigator's findings. In this demonstration we present StoryPivot, a framework that helps its users to detect evolving stories in event datasets over time. To resolve stories, we differentiate between story identification, the problem of connecting events over time within a source, and story alignment, the problem of integrating stories across sources. The goal of this demonstration is to present an interactive exploration of both these problems and how events can be dynamically interpreted and put into context in real-world datasets.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129414206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SCREEN: Stream Data Cleaning under Speed Constraints SCREEN:在速度限制下的流数据清理
Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu
{"title":"SCREEN: Stream Data Cleaning under Speed Constraints","authors":"Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu","doi":"10.1145/2723372.2723730","DOIUrl":"https://doi.org/10.1145/2723372.2723730","url":null,"abstract":"Stream data are often dirty, for example, owing to unreliable sensor reading, or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum change principle in data cleaning. To capture the knowledge about what is clean, we consider the (widely existing) constraints on the speed of data changes, such as fuel consumption per hour, or daily limit of stock prices. Guided by these semantic constraints, in this paper, we propose SCREEN, the first constraint-based approach for cleaning stream data. It is notable that existing data repair techniques clean (a sequence of) data as a whole and fail to support stream computation. To this end, we have to relax the global optimum over the entire sequence to the local optimum in a window. Rather than the commonly observed NP-hardness of general data repairing problems, our major contributions include: (1) polynomial time algorithm for global optimum, (2) linear time algorithm towards local optimum under an efficient Median Principle,(3) support on out-of-order arrivals of data points, and(4) adaptive window size for balancing repair accuracy and efficiency. Experiments on real datasets demonstrate that SCREEN can show significantly higher repair accuracy than the existing approaches such as smoothing.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128736271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Graft: A Debugging Tool For Apache Giraph Graft: Apache Giraph的调试工具
S. Salihoglu, Jaeho Shin, V. Khanna, Ba Quan Truong, J. Widom
{"title":"Graft: A Debugging Tool For Apache Giraph","authors":"S. Salihoglu, Jaeho Shin, V. Khanna, Ba Quan Truong, J. Widom","doi":"10.1145/2723372.2735353","DOIUrl":"https://doi.org/10.1145/2723372.2735353","url":null,"abstract":"We address the problem of debugging programs written for Pregel-like systems. After interviewing Giraph and GPS users, we developed Graft. Graft supports the debugging cycle that users typically go through: (1) Users describe programmatically the set of vertices they are interested in inspecting. During execution, Graft captures the context information of these vertices across supersteps. (2) Using Graft's GUI, users visualize how the values and messages of the captured vertices change from superstep to superstep,narrowing in suspicious vertices and supersteps. (3) Users replay the exact lines of the code vertex.compute() function that executed for the suspicious vertices and supersteps, by copying code that Graft generates into their development environments' line-by-line debuggers. Graft also has features to construct end-to-end tests for Giraph programs. Graft is open-source and fully integrated into Apache Giraph's main code base.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130045731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data 大型时空数据的时空在线推理与管理
Robert Christensen, Lu Wang, Feifei Li, K. Yi, Jun Tang, Natalee Villa
{"title":"STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data","authors":"Robert Christensen, Lu Wang, Feifei Li, K. Yi, Jun Tang, Natalee Villa","doi":"10.1145/2723372.2735373","DOIUrl":"https://doi.org/10.1145/2723372.2735373","url":null,"abstract":"We present the STORM system to enable spatio-temporal online reasoning and management of large spatio-temporal data. STORM supports interactive spatio-temporal analytics through novel spatial online sampling techniques. Online spatio-temporal aggregation and analytics are then derived based on the online samples, where approximate answers with approximation quality guarantees can be provided immediately from the start of query execution. The quality of these online approximations improve over time. This demonstration proposal describes key ideas in the design of the STORM system, and presents the demonstration plan.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"460 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127531226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
GetReal: Towards Realistic Selection of Influence Maximization Strategies in Competitive Networks 竞争网络中影响力最大化策略的现实选择
Hui Li, S. Bhowmick, Jiangtao Cui, Yunjun Gao, Jianfeng Ma
{"title":"GetReal: Towards Realistic Selection of Influence Maximization Strategies in Competitive Networks","authors":"Hui Li, S. Bhowmick, Jiangtao Cui, Yunjun Gao, Jianfeng Ma","doi":"10.1145/2723372.2723710","DOIUrl":"https://doi.org/10.1145/2723372.2723710","url":null,"abstract":"State-of-the-art classical influence maximization (IM) techniques are \"competition-unaware\" as they assume that a group (company) finds seeds (users) in a network independent of other groups who are also simultaneously interested in finding such seeds in the same network. However, in reality several groups often compete for the same market (e.g., Samsung, HTC, and Apple for the smart phone market) and hence may attempt to select seeds in the same network. This has led to increasing body of research in devising IM techniques for competitive networks. Despite the considerable progress made by these efforts toward finding seeds in a more realistic settings, unfortunately, they still make several unrealistic assumptions (e.g., a new company being aware of a rival's strategy, alternate seed selection, etc.) making their deployment impractical in real-world networks. In this paper, we propose a novel framework based on game theory to provide a more realistic solution to the IM problem in competitive networks by jettisoning these unrealistic assumptions. Specifically, we seek to find the \"best\" IM strategy (an algorithm or a mixture of algorithms) a group should adopt in the presence of rivals so that it can maximize its influence. As each group adopts some strategy, we model the problem as a game with each group as competitors and the expected influences under the strategies as payoffs. We propose a novel algorithm called GetReal to find each group's best solution by leveraging the competition between different groups. Specifically, it seeks to find whether there exist a Nash Equilibrium (NE) in a game, which guarantees that there exist an \"optimal\" strategy for each group. Our experimental study on real-world networks demonstrates the superiority of our solution in a more realistic environment.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128647791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
k-Shape: Efficient and Accurate Clustering of Time Series k-Shape:高效准确的时间序列聚类
John Paparrizos, L. Gravano
{"title":"k-Shape: Efficient and Accurate Clustering of Time Series","authors":"John Paparrizos, L. Gravano","doi":"10.1145/2723372.2737793","DOIUrl":"https://doi.org/10.1145/2723372.2737793","url":null,"abstract":"The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we present k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. To demonstrate the robustness of k-Shape, we perform an extensive experimental evaluation of our approach against partitional, hierarchical, and spectral clustering methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable approaches in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable (and hence impractical) combinations, with one exception that achieves similar accuracy results. However, unlike k-Shape, this combination requires tuning of its distance measure and is two orders of magnitude slower than k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and highly efficient clustering approach for time series with broad applications.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121163443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 265
A Secure Search Engine for the Personal Cloud 面向个人云的安全搜索引擎
Saliha Lallali, N. Anciaux, I. S. Popa, P. Pucheral
{"title":"A Secure Search Engine for the Personal Cloud","authors":"Saliha Lallali, N. Anciaux, I. S. Popa, P. Pucheral","doi":"10.1145/2723372.2735376","DOIUrl":"https://doi.org/10.1145/2723372.2735376","url":null,"abstract":"The emerging Personal Could paradigm holds the promise of a Privacy-by-Design storage and computing platform where personal data remain under the individual's control while being shared by valuable applications. However, leaving the data management control to user's hands pushes the security issues to the user's platform. This demonstration presents a Secure Personal Cloud Platform relying on a query and access control engine embedded in a tamper resistant hardware device connected to the user's platform. The main difficulty lies in the design of an inverted document index and its related search and update algorithms capable of tackling the strong hardware constraints of these devices. We have implemented our engine on a real tamper resistant hardware device and present its capacity to regulate the access to a personal dataspace. The objective of this demonstration is to show (1) that secure hardware is a key enabler of the Personal Cloud paradigm and (2) that new embedded indexing and querying techniques can tackle the hardware constraints of tamper-resistant devices and provide scalable solutions for the Personal Cloud.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121140205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信