2015 IEEE 31st International Conference on Data Engineering最新文献

Dynamic interaction graphs with probabilistic edge decay 具有概率边衰减的动态交互图

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-10-23 DOI: 10.1109/ICDE.2015.7113363

Wenlei Xie, Yuanyuan Tian, Yannis Sismanis, Andrey Balmin, P. Haas

{"title":"Dynamic interaction graphs with probabilistic edge decay","authors":"Wenlei Xie, Yuanyuan Tian, Yannis Sismanis, Andrey Balmin, P. Haas","doi":"10.1109/ICDE.2015.7113363","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113363","url":null,"abstract":"A large scale network of social interactions, such as mentions in Twitter, can often be modeled as a “dynamic interaction graph” in which new interactions (edges) are continually added over time. Existing systems for extracting timely insights from such graphs are based on either a cumulative “snapshot” model or a “sliding window” model. The former model does not sufficiently emphasize recent interactions. The latter model abruptly forgets past interactions, leading to discontinuities in which, e.g., the graph analysis completely ignores historically important influencers who have temporarily gone dormant. We introduce TIDE, a distributed system for analyzing dynamic graphs that employs a new “probabilistic edge decay” (PED) model. In this model, the graph analysis algorithm of interest is applied at each time step to one or more graphs obtained as samples from the current “snapshot” graph that comprises all interactions that have occurred so far. The probability that a given edge of the snapshot graph is included in a sample decays over time according to a user specified decay function. The PED model allows controlled trade-offs between recency and continuity, and allows existing analysis algorithms for static graphs to be applied to dynamic graphs essentially without change. For the important class of exponential decay functions, we provide efficient methods that leverage past samples to incrementally generate new samples as time advances. We also exploit the large degree of overlap between samples to reduce memory consumption from O(N) to O(logN) when maintaining N sample graphs. Finally, we provide bulk-execution methods for applying graph algorithms to multiple sample graphs simultaneously without requiring any changes to existing graph-processing APIs. Experiments on a real Twitter dataset demonstrate the effectiveness and efficiency of our TIDE prototype, which is built on top of the Spark distributed computing framework.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131793884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Efficient structural bulk updates on the Pre/Dist/Size XML encoding 有效的Pre/Dist/Size XML编码结构批量更新

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113305

L. Kircher, Michael Grossniklaus, C. Grün, M. Scholl

引用次数: 10

Preserving privacy in social networks against connection fingerprint attacks 在社交网络中保护隐私免受连接指纹攻击

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113272

Yazhe Wang, Baihua Zheng

{"title":"Preserving privacy in social networks against connection fingerprint attacks","authors":"Yazhe Wang, Baihua Zheng","doi":"10.1109/ICDE.2015.7113272","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113272","url":null,"abstract":"Existing works on identity privacy protection on social networks make the assumption that all the user identities in a social network are private and ignore the fact that in many real-world social networks, there exists a considerable amount of users such as celebrities, media users, and organization users whose identities are public. In this paper, we demonstrate that the presence of public users can cause serious damage to the identity privacy of other ordinary users. Motivated attackers can utilize the connection information of a user to some known public users to perform re-identification attacks, namely connection fingerprint (CFP) attacks. We propose two k-anonymization algorithms to protect a social network against the CFP attacks. One algorithm is based on adding dummy vertices. It can resist powerful attackers with the connection information of a user with the public users within n hops (n ≥ 1) and protect the centrality utility of public users. The other algorithm is based on edge modification. It is only able to resist attackers with the connection information of a user with the public users within 1 hop but preserves a rich spectrum of network utility. We perform comprehensive experiments on real-world networks and demonstrate that our algorithms are very efficient in terms of the running time and are able to generate k-anonymized networks with good utility.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115193676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Scalable distributed transactions across heterogeneous stores 跨异构存储的可伸缩分布式事务

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113278

Akon Dey, A. Fekete, Uwe Röhm

{"title":"Scalable distributed transactions across heterogeneous stores","authors":"Akon Dey, A. Fekete, Uwe Röhm","doi":"10.1109/ICDE.2015.7113278","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113278","url":null,"abstract":"Typical cloud computing systems provide highly scalable and fault-tolerant data stores that may sacrifice other features like general multi-item transaction support. Recently techniques to implement multi-item transactions in these types of systems have focused on transactions across homogeneous data stores. Since applications access data in heterogeneous storage systems for legacy or interoperability reasons, we propose an approach that enables multi-item transactions with snapshot isolation across multiple heterogeneous data stores using only a minimal set of commonly implemented features such as single item consistency, conditional updates, and the ability to store additional meta-data. We define an client-coordinated transaction commitment protocol that does not rely on a central coordinating infrastructure. The application can take advantage of the scalability and fault-tolerance characteristics of modern key-value stores and access existing data in them, and also have multi-item transactional access guarantees with little performance impact. We have implemented our design in a Java library called Cherry Garcia (CG), that supports data store abstractions to Windows Azure Storage (WAS), Google Cloud Storage (GCS) and our own high-performance key-value store called Tora.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117019076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Scalable parallelization of skyline computation for multi-core processors 多核处理器地平线计算的可伸缩并行化

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113358

S. Chester, Darius Sidlauskas, I. Assent, Kenneth S. Bøgh

引用次数: 45

Fine-grained controversy detection in Wikipedia 维基百科中的细粒度争议检测

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113426

Siarhei Bykau, Flip Korn, D. Srivastava, Yannis Velegrakis

{"title":"Fine-grained controversy detection in Wikipedia","authors":"Siarhei Bykau, Flip Korn, D. Srivastava, Yannis Velegrakis","doi":"10.1109/ICDE.2015.7113426","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113426","url":null,"abstract":"The advent of Web 2.0 gave birth to a new kind of application where content is generated through the collaborative contribution of many different users. This form of content generation is believed to generate data of higher quality since the “wisdom of the crowds” makes its way into the data. However, a number of specific data quality issues appear within such collaboratively generated data. Apart from normal updates, there are cases of intentional harmful changes known as vandalism as well as naturally occurring disagreements on topics which don't have an agreed upon viewpoint, known as controversies. While much work has focused on identifying vandalism, there has been little prior work on detecting controversies, especially at a fine granularity. Knowing about controversies when processing user-generated content is essential to understand the quality of the data and the trust that should be given to them. Controversy detection is a challenging task, since in the highly dynamic context of user updates, one needs to differentiate among normal updates, vandalisms and actual controversies. We describe a novel technique that finds these controversial issues by analyzing the edits that have been performed on the data over time. We apply the developed technique on Wikipedia, the world's largest known collaboratively generated database and we show that our approach has higher precision and recall than baseline approaches as well as is capable of finding previously unknown controversies.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123924476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Dish comment summarization based on bilateral topic analysis 基于双边话题分析的菜肴评论总结

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113308

Rong Zhang, Zhenjie Zhang, Xiaofeng He, Aoying Zhou

{"title":"Dish comment summarization based on bilateral topic analysis","authors":"Rong Zhang, Zhenjie Zhang, Xiaofeng He, Aoying Zhou","doi":"10.1109/ICDE.2015.7113308","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113308","url":null,"abstract":"With the prosperity of online services enabled by Web 2.0, huge amount of human generated commentary data are now available on the Internet, covering a wide range of domains on different products. Such comments contain valuable information for other customers, but are usually difficult to utilize due to the lack of common description structure, the complexity of opinion expression and fast growing data volume. Comment-based restaurant summarization is even more challenging than other types of products and services, as users' comments on restaurants are usually mixed with opinions on different dishes but attached with only one overall evaluation score on the whole experience with the restaurants. It is thus crucial to distinguish well-made dishes from other lousy dishes by mining the comment archive, in order to generate meaningful and useful summaries for other potential customers. This paper presents a novel approach to tackle the problem of restaurant comment summarization, with a core technique on the new bilateral topic analysis model on the commentary text data. In the bilateral topic model, the attributes discussed in the comments on the dishes and the user's evaluation on the attributes are considered as two independent dimensions in the latent space. Combined with new opinionated word extraction and clustering-based representation selection algorithms, our new analysis technique is effective to generate high-quality summary using representative snippets from the text comments. We evaluate our proposals on two real-world comment archives crawled from the most popular English and Chinese online restaurant review web sites, Yelp and Dianping. The experimental results verify the huge margin of advantage of our proposals on the summarization quality over baseline approaches in the literature.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123605827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData 盲人和大象结合了大数据的开源、学术和工业观点

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113417

C. Douglas, C. Curino

{"title":"Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData","authors":"C. Douglas, C. Curino","doi":"10.1109/ICDE.2015.7113417","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113417","url":null,"abstract":"This tutorial is organized in two parts. In the first half, we will present an overview of applications and services in the BigData ecosystem. We will use known distributed database and systems literature as landmarks to orient the attendees in this fast-evolving space. Throughout, we will contrast models of resource management, performance, and the constraints that shape the architectures of prominent systems. We will also discuss the role of academia and industry in the development of open-source infrastructure, with an emphasis on open problems and strategies for collaboration. We assume only basic familiarity with distributed systems. In the second half, we will delve into Apache Hadoop YARN. YARN (Yet Another Resource Negotiator) transformed Hadoop from a MapReduce engine to a general-purpose cluster scheduler. Since its introduction, it has been deployed in production and extended to support use cases beyond large-scale batch processing. The tutorial will present the active research and development supporting such heterogeneous workloads, with particular attention to multi-tenant scheduling. Topics include security, resource isolation, protocols, and preemption. This portion will be detailed, but accessible to anyone with a background in distributed systems and all attendees of the first half of the tutorial.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114206403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Clustering to forecast sparse time-series data 聚类预测稀疏时间序列数据

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113385

Abhaya Jha, S. Ray, Brian Seaman, I. Dhillon

{"title":"Clustering to forecast sparse time-series data","authors":"Abhaya Jha, S. Ray, Brian Seaman, I. Dhillon","doi":"10.1109/ICDE.2015.7113385","DOIUrl":"https://doi.org/10.1109/ICDE.2015.7113385","url":null,"abstract":"Forecasting accurately is essential to successful inventory planning in retail. Unfortunately, there is not always enough historical data to forecast items individually- this is particularly true in e-commerce where there is a long tail of low selling items, and items are introduced and phased out quite frequently, unlike physical stores. In such scenarios, it is preferable to forecast items in well-designed groups of similar items, so that data for different items can be pooled together to fit a single model. In this paper, we first discuss the desiderata for such a grouping and how it differs from the traditional clustering problem. We then describe our approach which is a scalable local search heuristic that can naturally handle the constraints required in this setting, besides being capable of producing solutions competitive with well-known clustering algorithms. We also address the complementary problem of estimating similarity, particularly in the case of new items which have no past sales. Our solution is to regress the sales profile of items against their semantic features, so that given just the semantic features of a new item we can predict its relation to other items, in terms of as yet unobserved sales. Our experiments demonstrate both the scalability of our approach and implications for forecast accuracy.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116477915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Big data: Old wine in new bottle? 大数据:新瓶装旧酒?

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-04-13 DOI: 10.1109/ICDE.2015.7113428

R. Agrawal

引用次数: 1