2016 IEEE 32nd International Conference on Data Engineering (ICDE)最新文献

Data profiling 数据概要分析

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2017-05-09 DOI: 10.1145/3035918.3054772

Ziawasch Abedjan, Lukasz Golab, Felix Naumann

引用次数: 104

TemProRA: Top-k temporal-probabilistic results analysis temproora: Top-k时间概率结果分析

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498350

K. Papaioannou, Michael H. Böhlen

引用次数: 7

Durable graph pattern queries on historical graphs 对历史图的持久图模式查询

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498269

Konstantinos Semertzidis, E. Pitoura

引用次数: 43

QB2OLAP: Enabling OLAP on Statistical Linked Open Data QB2OLAP:在统计关联开放数据上启用OLAP

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498341

Jovan Varga, Lorena Etcheverry, A. Vaisman, Oscar Romero, T. Pedersen, Christian Thomsen

引用次数: 21

Crowdsourced POI labelling: Location-aware result inference and Task Assignment 众包POI标签:位置感知结果推理和任务分配

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498229

Huiqi Hu, Yudian Zheng, Z. Bao, Guoliang Li, Jianhua Feng, Reynold Cheng

{"title":"Crowdsourced POI labelling: Location-aware result inference and Task Assignment","authors":"Huiqi Hu, Yudian Zheng, Z. Bao, Guoliang Li, Jianhua Feng, Reynold Cheng","doi":"10.1109/ICDE.2016.7498229","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498229","url":null,"abstract":"Identifying the labels of points of interest (POIs), aka POI labelling, provides significant benefits in location-based services. However, the quality of raw labels manually added by users or generated by artificial algorithms cannot be guaranteed. Such low-quality labels decrease the usability and result in bad user experiences. In this paper, by observing that crowdsourcing is a best-fit for computer-hard tasks, we leverage crowdsourcing to improve the quality of POI labelling. To our best knowledge, this is the first work on crowdsourced POI labelling tasks. In particular, there are two sub-problems: (1) how to infer the correct labels for each POI based on workers' answers, and (2) how to effectively assign proper tasks to workers in order to make more accurate inference for next available workers. To address these two problems, we propose a framework consisting of an inference model and an online task assigner. The inference model measures the quality of a worker on a POI by elaborately exploiting (i) worker's inherent quality, (ii) the spatial distance between the worker and the POI, and (iii) the POI influence, which can provide reliable inference results once a worker submits an answer. As workers are dynamically coming, the online task assigner judiciously assigns proper tasks to them so as to benefit the inference. The inference model and task assigner work alternately to continuously improve the overall quality. We conduct extensive experiments on a real crowdsourcing platform, and the results on two real datasets show that our method significantly outperforms state-of-the-art approaches.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"14 1","pages":"61-72"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85945105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Blocking for large-scale Entity Resolution: Challenges, algorithms, and practical examples 大规模实体解析的阻塞:挑战、算法和实际示例

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498364

G. Papadakis, Themis Palpanas

{"title":"Blocking for large-scale Entity Resolution: Challenges, algorithms, and practical examples","authors":"G. Papadakis, Themis Palpanas","doi":"10.1109/ICDE.2016.7498364","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498364","url":null,"abstract":"Entity Resolution constitutes one of the cornerstone tasks for the integration of overlapping information sources. Due to its quadratic complexity, a large amount of research has focused on improving its efficiency so that it scales to Web Data collections, which are inherently voluminous and highly heterogeneous. The most common approach for this purpose is blocking, which clusters similar entities into blocks so that the pair-wise comparisons are restricted to the entities contained within each block. In this tutorial, we take a close look on blocking-based Entity Resolution, starting from the early blocking methods that were crafted for database integration. We highlight the challenges posed by contemporary heterogeneous, noisy, voluminous Web Data and explain why they render inapplicable these schema-based techniques. We continue with the presentation of blocking methods that have been developed for large-scale and heterogeneous information and are suitable for Web Data collections. We also explain how their efficiency can be further improved by meta-blocking and parallelization techniques. We conclude with a hands-on session that demonstrates the relative performance of several, state-of-the-art techniques. The participants of the tutorial will put in practice all the topics discussed in the theory part, and will get familiar with a reference toolbox, which includes the most prominent techniques in the area and can be readily used to tackle Entity Resolution problems.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"1436-1439"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84183896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

QPlain: Query by explanation QPlain:按解释查询

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498344

Daniel Deutch, Amir Gilad

引用次数: 22

Reputation aggregation in peer-to-peer network using differential gossip algorithm 基于差分八卦算法的点对点网络声誉聚合

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498426

Ruchir Gupta, Y. N. Singh

引用次数: 0

Fast motif discovery in short sequences 在短序列中快速发现motif

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498321

Honglei Liu, Fangqiu Han, Hongjun Zhou, Xifeng Yan, K. Kosik

{"title":"Fast motif discovery in short sequences","authors":"Honglei Liu, Fangqiu Han, Hongjun Zhou, Xifeng Yan, K. Kosik","doi":"10.1109/ICDE.2016.7498321","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498321","url":null,"abstract":"Motif discovery in sequence data is fundamental to many biological problems such as antibody biomarker identification. Recent advances in instrumental techniques make it possible to generate thousands of protein sequences at once, which raises a big data issue for the existing motif finding algorithms: They either work only in a small scale of several hundred sequences or have to trade accuracy for efficiency. In this work, we demonstrate that by intelligently clustering sequences, it is possible to significantly improve the scalability of all the existing motif finding algorithms without losing accuracy at all. An anchor based sequence clustering algorithm (ASC) is thus proposed to divide a sequence dataset into multiple smaller clusters so that sequences sharing the same motif will be located into the same cluster. Then an existing motif finding algorithm can be applied to each individual cluster to generate motifs. In the end, the results from multiple clusters are merged together as final output. Experimental results show that our approach is generic and orders of magnitude faster than traditional motif finding algorithms. It can discover motifs from protein sequences in the scale that no existing algorithm can handle. In particular, ASC reduces the running time of a very popular motif finding algorithm, MEME, from weeks to a few minutes with even better accuracy.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"10 1","pages":"1158-1169"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86624224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Influence based cost optimization on user preference 基于影响的用户偏好成本优化

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI: 10.1109/ICDE.2016.7498283

Jianye Yang, Ying Zhang, W. Zhang, Xuemin Lin

{"title":"Influence based cost optimization on user preference","authors":"Jianye Yang, Ying Zhang, W. Zhang, Xuemin Lin","doi":"10.1109/ICDE.2016.7498283","DOIUrl":"https://doi.org/10.1109/ICDE.2016.7498283","url":null,"abstract":"The popularity of e-business and preference learning techniques have contributed a huge amount of product and user preference data. Analyzing the influence of an existing or new product among the users is critical to unlock the great scientific and social-economic value of these data. In this paper, we advocate the problem of influence-based cost optimization for the user preference and product data, which is fundamental in many real applications such as marketing and advertising. Generally, we aim to find a cost optimal position for a new product such that it can attract at least k or a particular percentage of users for the given user preference functions and competitors' products. Although we show the solution space of our problem can be reduced to a finite number of possible positions (points) by utilizing the classical k-level computation techniques, the computation cost is still very expensive due to the nature of the high combinatorial complexity of the k-level problem. To alleviate this issue, we develop efficient pruning and query processing techniques to significantly improve the performance. In particular, our traverse-based 2-dimensional algorithm is very efficient with time complexity O(n) where n is the number of user preference functions. For general multi-dimensional spaces, we develop space partition based algorithm to significantly improve the performance by utilizing cost-based, influence-based and local dominance based pruning techniques. Then, we show that the performance of the partition based algorithm can be further enhanced by utilizing sampling approach, where the problem can be reduced to the classical half-space intersection problem. We demonstrate the efficiency of our techniques with extensive experiments over real and synthetic datasets.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"15 1","pages":"709-720"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76677213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8