2016 IEEE 32nd International Conference on Data Engineering (ICDE)最新文献_第7页

Dark Data: Are we solving the right problems? 暗数据:我们正在解决正确的问题吗?

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498366

Michael J. Cafarella, I. Ilyas, Marcel Kornacker, Tim Kraska, C. Ré

引用次数: 11

Link prediction in graph streams 图流中的链接预测

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498270

Peixiang Zhao, C. Aggarwal, Gewen He

{"title":"Link prediction in graph streams","authors":"Peixiang Zhao, C. Aggarwal, Gewen He","doi":"10.1109/ICDE.2016.7498270","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498270","url":null,"abstract":"Link prediction is a fundamental problem that aims to estimate the likelihood of the existence of edges (links) based on the current observed structure of a graph, and has found numerous applications in social networks, bioinformatics, E-commerce, and the Web. In many real-world scenarios, however, graphs are massive in size and dynamically evolving in a fast rate, which, without loss of generality, are often modeled and interpreted as graph streams. Existing link prediction methods fail to generalize in the graph stream setting because graph snapshots where link prediction is performed are no longer readily available in memory, or even on disks, for effective graph computation and analysis. It is therefore highly desirable, albeit challenging, to support link prediction online and in a dynamic way, which, in this paper, is referred to as the streaming link prediction problem in graph streams. In this paper, we consider three fundamental, neighborhood-based link prediction target measures, Jaccard coefficient, common neighbor, and Adamic-Adar, and provide accurate estimation to them in order to address the streaming link prediction problem in graph streams. Our main idea is to design cost-effective graph sketches (constant space per vertex) based on MinHash and vertex-biased sampling techniques, and to propose efficient sketch based algorithms (constant time per edge) with both theoretical accuracy guarantee and robust estimation results. We carry out experimental studies in a series of real-world graph streams. The results demonstrate that our graph sketch based methods are accurate, efficient, cost-effective, and thus can be practically employed for link prediction in real-world graph streams.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"64 1","pages":"553-564"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74512062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

SPORE: A sequential personalized spatial item recommender system 《孢子》:连续的个性化空间道具推荐系统

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498304

Weiqing Wang, Hongzhi Yin, S. Sadiq, Ling Chen, M. Xie, Xiaofang Zhou

{"title":"SPORE: A sequential personalized spatial item recommender system","authors":"Weiqing Wang, Hongzhi Yin, S. Sadiq, Ling Chen, M. Xie, Xiaofang Zhou","doi":"10.1109/ICDE.2016.7498304","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498304","url":null,"abstract":"With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important way of helping users discover interesting locations to increase their engagement with location-based services. Although human movement exhibits sequential patterns in LBSNs, most current studies on spatial item recommendations do not consider the sequential influence of locations. Leveraging sequential patterns in spatial item recommendation is, however, very challenging, considering 1) users' check-in data in LBSNs has a low sampling rate in both space and time, which renders existing prediction techniques on GPS trajectories ineffective; 2) the prediction space is extremely large, with millions of distinct locations as the next prediction target, which impedes the application of classical Markov chain models; and 3) there is no existing framework that unifies users' personal interests and the sequential influence in a principled manner. In light of the above challenges, we propose a sequential personalized spatial item recommendation framework (SPORE) which introduces a novel latent variable topic-region to model and fuse sequential influence with personal interests in the latent and exponential space. The advantages of modeling the sequential effect at the topic-region level include a significantly reduced prediction space, an effective alleviation of data sparsity and a direct expression of the semantic meaning of users' spatial activities. Furthermore, we design an asymmetric Locality Sensitive Hashing (ALSH) technique to speed up the online top-k recommendation process by extending the traditional LSH. We evaluate the performance of SPORE on two real datasets and one large-scale synthetic dataset. The results demonstrate a significant improvement in SPORE's ability to recommend spatial items, in terms of both effectiveness and efficiency, compared with the state-of-the-art methods.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"39 1","pages":"954-965"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74638452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93

Revenue maximization by viral marketing: A social network host's perspective 通过病毒式营销实现收益最大化:社交网络主机的观点

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498227

Arijit Khan, Benjamin Zehnder, Donald Kossmann

{"title":"Revenue maximization by viral marketing: A social network host's perspective","authors":"Arijit Khan, Benjamin Zehnder, Donald Kossmann","doi":"10.1109/ICDE.2016.7498227","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498227","url":null,"abstract":"We study the novel problem of revenue maximization of a social network host that sells viral marketing campaigns to multiple competing campaigners. Each client campaigner informs the social network host about her target users in the network, as well as how much money she is willing to pay to the host if one of her target users buys her product. The social network host, in turn, assigns a set of seed users to each of her client campaigners. The seed set for a campaigner is a limited number of users to whom the campaigner provides free samples, discounted price etc. with the expectation that these seed users will buy her product, and would also be able to influence many of her target users in the network towards buying her product. Because of various product-adoption costs, it is very unlikely that an average user will purchase more than one of the competing products. Therefore, from the host's perspective, it is important to assign seed users to client campaigners in such a way that the seed assignment guarantees the maximum aggregated revenue for the host considering all her client campaigners. We formulate our problem by following two well-established influence cascading models: the independent cascade model and the linear threshold model. While our problem using both these models is NP-hard, and neither monotonic, nor sub-modular; we develop approximated algorithms with theoretical performance guarantees. However, as our approximated algorithms often incur higher running times, we also design efficient heuristic methods that empirically perform as good as our approximated algorithms. Our detailed experimental evaluation attests that the proposed techniques are effective and scalable over real-world datasets.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"39 1","pages":"37-48"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85728876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

SPDO: High-throughput road distance computations on Spark using Distance Oracles SPDO:使用distance oracle在Spark上进行高吞吐量道路距离计算

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498328

Shangfu Peng, Jagan Sankaranarayanan, H. Samet

引用次数: 18

Beat the DIVa - decentralized identity validation for online social networks 击败DIVa——在线社交网络的分散身份验证

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498337

Leila Bahri, Amira Soliman, Jacopo Squillaci, B. Carminati, E. Ferrari, Sarunas Girdzijauskas

{"title":"Beat the DIVa - decentralized identity validation for online social networks","authors":"Leila Bahri, Amira Soliman, Jacopo Squillaci, B. Carminati, E. Ferrari, Sarunas Girdzijauskas","doi":"10.1109/ICDE.2016.7498337","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498337","url":null,"abstract":"Fake accounts in online social networks (OSNs) have known considerable sophistication and are now attempting to gain network trust by infiltrating within honest communities. Honest users have limited perspective on the truthfulness of new online identities requesting their friendship. This facilitates the task of fake accounts in deceiving honest users to befriend them. To address this, we have proposed a model that learns hidden correlations between profile attributes within OSN communities, and exploits them to assist users in estimating the trustworthiness of new profiles. To demonstrate our method, we suggest, in this demo, a game application through which players try to cheat the system and convince nodes in a simulated OSN to befriend them. The game deploys different strategies to challenge the players and to reach the objectives of the demo. These objectives are to make participants aware of how fake accounts can infiltrate within their OSN communities, to demonstrate how our suggested method could aid in mitigating this threat, and to eventually strengthen our model based on the data collected from the moves of the players.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1330-1333"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74877790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient answering of why-not questions in similar graph matching 相似图匹配中why-not问题的高效回答

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498382

Md. Saiful Islam, Chengfei Liu, Jianxin Li

{"title":"Efficient answering of why-not questions in similar graph matching","authors":"Md. Saiful Islam, Chengfei Liu, Jianxin Li","doi":"10.1109/ICDE.2016.7498382","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498382","url":null,"abstract":"Graph data management and matching similar graphs are very important for many applications including bioinformatics, computer vision, VLSI design, bug localization, road networks, social and communication networking. Many graph indexing and similarity matching techniques have already been proposed for managing and querying graph data. In similar graph matching, a user is returned with the database graphs whose distances with the query graph are below a threshold. In such query settings, a user may not receive certain database graphs that are very similar to the query graph if the initial query graph is inappropriate/imperfect for the expected answer set. To exemplify this, consider a drug designer who is looking for chemical compounds that could be the target of her hypothetical drug before realizing it. In response to her query, the traditional search system may return the structures from the database that are most similar to the query graph. However, she may get surprised if some of the expected targets are missing in the answer set. She may then seek assistance from the system by asking “Is there other query graph that can match my expected answer set?”. The system may then modify her initial query graph to include the missing answers in the new answer set. Here, we study this kind of problem of answering why-not questions in similar graph matching for graph databases.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"99 1","pages":"1476-1477"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77969909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Topical influence modeling via topic-level interests and interactions on social curation services 通过社会策展服务的话题层面兴趣和互动进行话题影响建模

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498225

Daehoon Kim, Jae-Gil Lee, B. Lee

{"title":"Topical influence modeling via topic-level interests and interactions on social curation services","authors":"Daehoon Kim, Jae-Gil Lee, B. Lee","doi":"10.1109/ICDE.2016.7498225","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498225","url":null,"abstract":"Social curation services are emerging social media platforms that enable users to curate their contents according to the topic and express their interests at the topic level by following curated collections of other users' contents rather than the users themselves. The topic-level information revealed through this new feature far exceeds what existing methods solicit from the traditional social networking services, to greatly enhance the quality of topic-sensitive influence modeling. In this paper, we propose a novel model called the topical influence with social curation (TISC) to find influential users from social curation services. This model, formulated by the continuous conditional random field, fully takes advantage of the explicitly available topic-level information reflected in both contents and interactions. In order to validate its merits, we comprehensively compare TISC with state-of-the-art models using two real-world data sets collected from Pinterest and Scoop.it. The results show that TISC achieves higher accuracy by up to around 80% and finds more convincing results in case studies than the other models. Moreover, we develop a distributed learning algorithm on Spark and demonstrate its excellent scalability on a cluster of 48 cores.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"17 1","pages":"13-24"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81504875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Analyzing data-centric applications: Why, what-if, and how-to 分析以数据为中心的应用程序:为什么、假设和如何做

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498289

P. Bourhis, Daniel Deutch, Y. Moskovitch

引用次数: 11

Microblogs data management and analysis 微博数据管理与分析

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498365

A. Magdy, M. Mokbel

{"title":"Microblogs data management and analysis","authors":"A. Magdy, M. Mokbel","doi":"10.1109/ICDE.2016.7498365","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498365","url":null,"abstract":"Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including detecting and analyzing events, user analysis for geo-targeted ads and political elections, and critical applications like discovering health issues and rescue services. Consequently, major research efforts are spent to analyze and manage microblogs data to support different applications. In this tutorial, we give a 1.5 hours overview about microblogs data analysis, management, and systems. The tutorial gives a comprehensive review for research efforts that are trying to analyze microblogs contents to build on them new functionality and use cases. In addition, the tutorial reviews existing research that propose core data management components to support microblogs queries at scale. Finally, the tutorial reviews system-level issues and on-going work on supporting microblogs data through the rising big data systems. Through its different parts, the tutorial highlights the challenges and opportunities in microblogs data research.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"1440-1443"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78623556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1