2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献

筛选
英文 中文
Efficient Structural Clustering in Large Uncertain Graphs 大型不确定图的高效结构聚类
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00215
Yongjiang Liang, Tingting Hu, Peixiang Zhao
{"title":"Efficient Structural Clustering in Large Uncertain Graphs","authors":"Yongjiang Liang, Tingting Hu, Peixiang Zhao","doi":"10.1109/ICDE48307.2020.00215","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00215","url":null,"abstract":"Clustering uncertain graphs based on the probabilistic graph model has sparked extensive research and widely varying applications. Existing structural clustering methods rely heavily on the computation of pairwise reliable structural similarity between vertices, which has proven to be extremely costly, especially in large uncertain graphs. In this paper, we develop a new, decomposition-based method, ProbSCAN, for efficient reliable structural similarity computation with theoretically improved complexity. We further design a cost-effective index structure UCNO-Index, and a series of powerful pruning strategies to expedite reliable structural similarity computation in uncertain graphs. Experimental studies on eight real-world uncertain graphs demonstrate the effectiveness of our proposed solutions, which achieves orders of magnitude improvement of clustering efficiency, compared with the state-of-the-art structural clustering methods in large uncertain graphs.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"32 1","pages":"1966-1969"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79769453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SUDAF: Sharing User-Defined Aggregate Functions SUDAF:共享用户自定义聚合函数
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00161
Chao Zhang, F. Toumani, B. Doreau
{"title":"SUDAF: Sharing User-Defined Aggregate Functions","authors":"Chao Zhang, F. Toumani, B. Doreau","doi":"10.1109/ICDE48307.2020.00161","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00161","url":null,"abstract":"We present SUDAF (Sharing User-Defined Aggregate Functions), a declarative framework that allows users to formulate UDAFs as mathematical expressions and use them in SQL statements. SUDAF rewrites partial aggregates of UDAFs using built-in aggregate functions and supports efficient dynamic caching and reusing of partial aggregates. Our evaluation shows that using SUDAF on top of Spark SQL can lead from one to two orders of magnitude improvement in query execution times compared to the original Spark SQL.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"1750-1553"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84741099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Outdated Fact Detection in Knowledge Bases 知识库中过时的事实检测
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00196
Shuang Hao, Chengliang Chai, Guoliang Li, N. Tang, Ning Wang, Xiang Yu
{"title":"Outdated Fact Detection in Knowledge Bases","authors":"Shuang Hao, Chengliang Chai, Guoliang Li, N. Tang, Ning Wang, Xiang Yu","doi":"10.1109/ICDE48307.2020.00196","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00196","url":null,"abstract":"Knowledge bases (KBs), which store high-quality information, are crucial for many applications, such as enhancing search results and serving as external sources for data cleaning. Not surprisingly, there exist outdated facts in most KBs due to the rapid change of information. Naturally, it is important to keep KBs up-to-date. Traditional wisdom has investigated the problem of using reference data (such as new facts extracted from the news) to detect outdated facts in KBs. However, existing approaches can only cover a small percentage of facts in KBs. In this paper, we propose a novel human-in-the-loop approach for outdated fact detection in KBs. It trains a binary classifier using features such as historical update frequency and existence time of a fact to compute the likelihood of a fact in a KB to be outdated. Then, it interacts with humans to verify whether a fact with high likelihood is indeed outdated. In addition, it also uses logical rules to detect more outdated facts based on human feedback. The outdated facts detected by the logical rules will also be fed back to train the ML model further for data augmentation. Extensive experiments on real-world KBs, such as Yago and DBpedia, show the effectiveness of our solution.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"76 1","pages":"1890-1893"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84934452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Class of R*-tree Indexes for Spatial-Visual Search of Geo-tagged Street Images 一类基于R*树索引的地理标记街道图像空间视觉搜索
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00221
Abdullah Alfarrarjeh, S. H. Kim, V. Hegde, Akshansh, C. Shahabi, Q. Xie, S. Ravada
{"title":"A Class of R*-tree Indexes for Spatial-Visual Search of Geo-tagged Street Images","authors":"Abdullah Alfarrarjeh, S. H. Kim, V. Hegde, Akshansh, C. Shahabi, Q. Xie, S. Ravada","doi":"10.1109/ICDE48307.2020.00221","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00221","url":null,"abstract":"Due to the prevalence of GPS-equipped cameras (e.g., smartphones and surveillance cameras), massive amounts of geo-tagged images capturing urban streets are increasingly being collected. Consequently, many smart city applications have emerged, relying on efficient image search. Such searches include spatial-visual queries in which spatial and visual properties are used in tandem to retrieve similar images to a given query image within a given geographical region. Towards this end, new index structures that organize images based on both spatial and visual properties are needed to efficiently execute such queries. Based on our observation that street images are typically similar in the same spatial locality, index structures for spatial-visual queries can be effectively built on a spatial index (i.e., R*-tree). Therefore, we propose a class of R*-tree indexes, particularly, by associating each node with two separate minimum bounding rectangles (MBR), one for spatial and the other for (dimension-reduced) visual properties of their contained images, and adapting the R*-tree optimization criteria to both property types.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"49 1","pages":"1990-1993"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85035560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Graph Embeddings for One-pass Processing of Heterogeneous Queries 异构查询一次处理的图嵌入
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00222
Chi Thang Duong, Hongzhi Yin, Dung Hoang, Minn Hung Nguyen, M. Weidlich, Quoc Viet Hung Nguyen, K. Aberer
{"title":"Graph Embeddings for One-pass Processing of Heterogeneous Queries","authors":"Chi Thang Duong, Hongzhi Yin, Dung Hoang, Minn Hung Nguyen, M. Weidlich, Quoc Viet Hung Nguyen, K. Aberer","doi":"10.1109/ICDE48307.2020.00222","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00222","url":null,"abstract":"Effective information retrieval (IR) relies on the ability to comprehensively capture a user’s information needs. Traditional IR systems are limited to homogeneous queries that define the information to retrieve by a single modality. Support for heterogeneous queries that combine different modalities has been proposed recently. Yet, existing approaches for heterogeneous querying are computationally expensive, as they require several passes over the data to construct a query answer.In this paper, we propose an IR system that overcomes the computational challenges imposed by heterogeneous queries by adopting graph embeddings. Specifically, we propose graph-based models in which both, data and queries, incorporate information of different modalities. Then, we show how either representation is transformed into a graph embedding in the same space, capturing relations between information of different modalities. By grounding query processing in graph embeddings, we enable processing of heterogeneous queries with a single pass over the data representation. Our experiments on several real-world and synthetic datasets illustrate that our technique is able to return twice the amount of relevant information in comparison with several baselines, while being scalable to large-scale data.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"1994-1997"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87283235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Answering Skyline Queries over Incomplete Data with Crowdsourcing(Extended Abstract) 用众包解决不完整数据的Skyline查询(扩展摘要)
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00235
Xiaoye Miao, Yunjun Gao, Su Guo, Lu Chen, Jianwei Yin, Qing Li
{"title":"Answering Skyline Queries over Incomplete Data with Crowdsourcing(Extended Abstract)","authors":"Xiaoye Miao, Yunjun Gao, Su Guo, Lu Chen, Jianwei Yin, Qing Li","doi":"10.1109/ICDE48307.2020.00235","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00235","url":null,"abstract":"Due to the pervasiveness of incomplete data, incomplete data queries are vital in a large number of real-life scenarios. Current models and approaches for incomplete data queries mainly rely on the machine power. In this paper, we study the problem of skyline queries over incomplete data with crowdsourcing. We propose a novel query framework, termed as BayesCrowd, on top of Bayesian network and the typical c-table model on incomplete data. Considering budget and latency constraints, we present a suite of effective task selection strategies. In particular, since the probability computation of each object being an answer object is at least as hard as #SAT problem, we propose an adaptive DPLL (i.e., Davis-Putnam-Logemann-Loveland) algorithm to speed up the computation. Extensive experiments using both real and synthetic data sets confirm the superiority of BayesCrowd to the state-of-the-art method.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"41 1","pages":"2032-2033"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86335997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PrefixFPM: A Parallel Framework for General-Purpose Frequent Pattern Mining PrefixFPM:通用频繁模式挖掘的并行框架
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00208
Da Yan, Wenwen Qu, Guimu Guo, Xiaoling Wang
{"title":"PrefixFPM: A Parallel Framework for General-Purpose Frequent Pattern Mining","authors":"Da Yan, Wenwen Qu, Guimu Guo, Xiaoling Wang","doi":"10.1109/ICDE48307.2020.00208","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00208","url":null,"abstract":"Frequent pattern mining (FPM) has been a focused theme in data mining research for decades, but there lacks a general programming framework that can be easily customized to mine different kinds of frequent patterns, and existing solutions to FPM over big transaction databases are IO-bound rendering CPU cores underutilized even though FPM is NP-hard.This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. PrefixFPM follows the idea of prefix projection to partition the workloads of PFM into independent tasks by divide and conquer. PrefixFPM exposes a unified programming interface to users who can customize it to mine their desired patterns, and the parallel execution engine is transparent to end-users and can be reused for mining all kinds of patterns. We have adapted the state-of-the-art serial algorithms for mining frequent patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of cores.A demo is available at https://youtu.be/PfioC0GDpsw; the code is available at https://github.com/yanlab19870714/PrefixFPM.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"45 1","pages":"1938-1941"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88085549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Automated Anomaly Detection in Large Sequences 大序列中的自动异常检测
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00182
Paul Boniol, Michele Linardi, Federico Roncallo, Themis Palpanas
{"title":"Automated Anomaly Detection in Large Sequences","authors":"Paul Boniol, Michele Linardi, Federico Roncallo, Themis Palpanas","doi":"10.1109/ICDE48307.2020.00182","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00182","url":null,"abstract":"Subsequence anomaly (or outlier) detection in long sequences is an important problem with applications in a wide range of domains. However, current approaches have severe limitations: they either require prior domain knowledge, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose NorM, a novel approach, suitable for domain-agnostic anomaly detection. NorM is based on a new data series primitive, which permits to detect anomalies based on their (dis)similarity to a model that represents normal behavior. The experimental results on several real datasets demonstrate that the proposed approach outperforms by a large margin the current state-of-the art algorithms in terms of accuracy, while being orders of magnitude faster.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"23 1","pages":"1834-1837"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90234173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Computing Mutual Information of Big Categorical Data and Its Application to Feature Grouping 大分类数据互信息计算及其在特征分组中的应用
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00210
Junli Li, Chaowei Zhang, Jifu Zhang, X. Qin
{"title":"Computing Mutual Information of Big Categorical Data and Its Application to Feature Grouping","authors":"Junli Li, Chaowei Zhang, Jifu Zhang, X. Qin","doi":"10.1109/ICDE48307.2020.00210","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00210","url":null,"abstract":"This paper develops a parallel computing system - MiCS - for mutual information of big categorical data on the Spark computing platform. The MiCS algorithm is conductive to processing a large amount and strong repeatability of mutual-information calculation among feature pairs by applying a column-wise transformation scheme. And to improve the efficiency of the MiCS and the utilization rate of Spark cluster resources, we adopt a virtual partitioning scheme to achieve balanced load while mitigating the data skewness problem in the Spark Shuffle process.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1946-1949"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83101872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive Network Alignment with Unsupervised and Multi-order Convolutional Networks 无监督多阶卷积网络的自适应网络对齐
2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00015
T. T. Huynh, Vinh Tong, T. Nguyen, Hongzhi Yin, M. Weidlich, Nguyen Quoc Viet Hung
{"title":"Adaptive Network Alignment with Unsupervised and Multi-order Convolutional Networks","authors":"T. T. Huynh, Vinh Tong, T. Nguyen, Hongzhi Yin, M. Weidlich, Nguyen Quoc Viet Hung","doi":"10.1109/ICDE48307.2020.00015","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00015","url":null,"abstract":"Network alignment is the problem of pairing nodes between two graphs such that the paired nodes are structurally and semantically similar. A well-known application of network alignment is to identify which accounts in different social networks belong to the same person. Existing alignment techniques, however, lack scalability, cannot incorporate multi-dimensional information without training data, and are limited in the consistency constraints enforced by an alignment. In this paper, we propose a fully unsupervised network alignment framework based on a multi-order embedding model. The model learns the embeddings of each node using a graph convolutional neural representation, which we prove to satisfy consistency constraints. We further design a data augmentation method and a refinement mechanism to make the model adaptive to consistency violations and noise. Extensive experiments on real and synthetic datasets show that our model outperforms state-of-the-art alignment techniques. We also demonstrate the robustness of our model against adversarial conditions, such as structural noises, attribute noises, graph size imbalance, and hyper-parameter sensitivity.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"64 1","pages":"85-96"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84484193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信