2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献_第9页

SuRF: Identification of Interesting Data Regions with Surrogate Models 用代理模型识别感兴趣的数据区域

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00118

Fotis Savva, C. Anagnostopoulos, P. Triantafillou

{"title":"SuRF: Identification of Interesting Data Regions with Surrogate Models","authors":"Fotis Savva, C. Anagnostopoulos, P. Triantafillou","doi":"10.1109/ICDE48307.2020.00118","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00118","url":null,"abstract":"Several data mining tasks focus on repeatedly inspecting multidimensional data regions summarized by a statistic. The value of this statistic (e.g., region-population sizes, order moments) is used to classify the region’s interesting-ness. These regions can be naively extracted from the entire dataspace – however, this is extremely time-consuming and compute-resource demanding. This paper studies the reverse problem: analysts provide a cut-off value for a statistic of interest and in turn our proposed framework efficiently identifies multidimensional regions whose statistic exceeds (or is below) the given cut-off value (according to user’s needs). However, as data dimensions and size increase, such task inevitably becomes laborious and costly. To alleviate this cost, our solution, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest. It then makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality. The accuracy, efficiency, and scalability of our approach are demonstrated with experiments using synthetic and real-world datasets and compared with other methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"27 1","pages":"1321-1332"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79871950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

VAC: Vertex-Centric Attributed Community Search 以顶点为中心的属性社区搜索

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00086

Qing Liu, Yifan Zhu, Minjun Zhao, Xin Huang, Jianliang Xu, Yunjun Gao

{"title":"VAC: Vertex-Centric Attributed Community Search","authors":"Qing Liu, Yifan Zhu, Minjun Zhao, Xin Huang, Jianliang Xu, Yunjun Gao","doi":"10.1109/ICDE48307.2020.00086","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00086","url":null,"abstract":"Attributed community search aims to find the community with strong structure and attribute cohesiveness from attributed graphs. However, existing works suffer from two major limitations: (i) it is not easy to set the conditions on query attributes; (ii) the queries support only a single type of attributes. To make up for these deficiencies, in this paper, we study a novel attributed community search called vertex-centric attributed community (VAC) search. Given an attributed graph and a query vertex set, the VAC search returns the community which is densely connected (ensured by the k-truss model) and has the best attribute score. We show that the problem is NP-hard. To answer the VAC search, we develop both exact and approximate algorithms. Specifically, we develop two exact algorithms. One searches the community in a depth-first manner and the other is in a best-first manner. We also propose a set of heuristic strategies to prune the unqualified search space by exploiting the structure and attribute properties. In addition, to further improve the search efficiency, we propose a 2-approximation algorithm. Comprehensive experimental studies on various realworld attributed graphs demonstrate the effectiveness of the proposed model and the efficiency of the developed algorithms.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"50 1","pages":"937-948"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79950163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Skyline Cohesive Group Queries in Large Road-social Networks 大型道路社交网络中的Skyline内聚组查询

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00041

Qiyan Li, Yuanyuan Zhu, J. Yu

{"title":"Skyline Cohesive Group Queries in Large Road-social Networks","authors":"Qiyan Li, Yuanyuan Zhu, J. Yu","doi":"10.1109/ICDE48307.2020.00041","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00041","url":null,"abstract":"Given a network with social and spatial information, cohesive group queries aim at finding a group of users, which are strongly connected and closely co-located. Most existing studies limit to finding groups either with the strongest social ties under certain spatial constraint or minimum spatial distance under certain social constraints. It is difficult for users to decide which constraints they need to choose and how to decide the priority of the constraints to meet their real requirements since the social constraint and spatial constraint are different in nature. In this paper, we take a new approach to consider the constraints equally and study a skyline query. Specifically, given a road-social network consisting of a road network Gr and a location-based social network Gs, we aim to find a set of skyline cohesive groups, in which each group cannot be dominated by any other group in terms of social cohesiveness and spatial cohesiveness. We find a group of users using social cohesiveness based on (k, c)-core (a k-core of size c) and spatial cohesiveness based on travel cost to a meeting point from group members. Such skyline problem is NP-hard as we need to explore the combinations of c vertices to check whether it is a qualified (k, c)-core. In this paper, we first provide exact solutions by developing efficient pruning strategies to filter out a large number of combinations which cannot form a (k, c)-core, and then propose highly efficient greedy solutions based on a newly designed cd-tree to keep the distance on the road network and social structural information simultaneously. Experimental results show that our exact methods run faster than the brute-force methods by 2-4 orders of magnitude in general, and our cd-tree based greedy methods can significantly reduce the computation cost by 1-4 order of magnitude while the extra travel cost is less than 5% compared to the exact method on multiple real road-social networks.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"94 1","pages":"397-408"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83905313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

An Agile Sample Maintenance Approach for Agile Analytics 敏捷分析的敏捷样本维护方法

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00071

Hanbing Zhang, Yazhong Zhang, Zhenying He, Yinan Jing, Kai Zhang, X. S. Wang

{"title":"An Agile Sample Maintenance Approach for Agile Analytics","authors":"Hanbing Zhang, Yazhong Zhang, Zhenying He, Yinan Jing, Kai Zhang, X. S. Wang","doi":"10.1109/ICDE48307.2020.00071","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00071","url":null,"abstract":"Agile analytics can help organizations to gain and sustain a competitive advantage by making timely decisions. Approximate query processing (AQP) is one of the useful approaches in agile analytics, which facilitates fast queries on big data by leveraging a pre-computed sample. One problem such a sample faces is that when new data is being imported, re-sampling is most likely needed to keep the sample fresh and AQP results accurate enough. Re-sampling from scratch for every batch of new data, called the full re-sampling method and adopted by many existing AQP works, is obviously a very costly process, and a much quicker incremental sampling process, such as reservoir sampling, may be used to cover the newly arrived data. However, incremental update methods suffer from the fact that the sample size cannot be increased, which is a problem when the underlying data distribution dramatically changes and the sample needs to be enlarged to maintain the AQP accuracy. This paper proposes an adaptive sample update (ASU) approach that avoids re-sampling from scratch as much as possible by monitoring the data distribution, and uses instead an incremental update method before a re-sampling becomes necessary. The paper also proposes an enhanced approach (T-ASU), which tries to enlarge the sample size without re-sampling from scratch when a bit of query inaccuracy is tolerable to further reduce the sample update cost. These two approaches are integrated into a state-of-the-art AQP engine for an extensive experimental study. Experimental results on both real-world and synthetic datasets show that the two approaches are faster than the full re-sampling method while achieving almost the same AQP accuracy when the underlying data distribution continuously changes.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"117 1","pages":"757-768"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89337750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved Correlated Sampling for Join Size Estimation 用于连接大小估计的改进相关抽样

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00035

Taining Wang, C. Chan

引用次数: 10

JUST: JD Urban Spatio-Temporal Data Engine JUST: JD城市时空数据引擎

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00138

Ruiyuan Li, Huajun He, Rubin Wang, Yuchuan Huang, Junwen Liu, Sijie Ruan, Tianfu He, Jie Bao, Yu Zheng

{"title":"JUST: JD Urban Spatio-Temporal Data Engine","authors":"Ruiyuan Li, Huajun He, Rubin Wang, Yuchuan Huang, Junwen Liu, Sijie Ruan, Tianfu He, Jie Bao, Yu Zheng","doi":"10.1109/ICDE48307.2020.00138","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00138","url":null,"abstract":"With the prevalence of positioning techniques, a prodigious number of spatio-temporal data is generated con-stantly. To effectively support sophisticated urban applications, e.g., location-based services, based on spatio-temporal data, it is desirable for an efficient, scalable, update-enabled, and easy-to-use spatio-temporal data management system.This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way. JUST incorporates the distributed NoSQL data store, i.e., Apache HBase, as the underlying storage, GeoMesa as the spatio-temporal data indexing tool, and Apache Spark as the execution engine. We creatively design two indexing techniques, i.e., Z2T and XZ2T, which accelerates spatio-temporal queries tremendously. Furthermore, we introduce a compression mechanism, which not only greatly reduces the storage cost, but also improves the query efficiency. To make JUST easy-to-use, we design and implement a complete SQL engine, with which all operations can be performed through a SQL-like query language, i.e., JustQL. JUST also supports inherently new data insertions and historical data updates without index reconstruction. JUST is deployed as a PaaS in JD with multi-users support. Many applications have been developed based on the SDKs provided by JUST. Extensive experiments are carried out with six state-of-the-art distributed spatio-temporal data management systems based on two real datasets and one synthetic dataset. The results show that JUST has a competitive query performance and is much more scalable than them.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"53 1","pages":"1558-1569"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86783073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Efficient Bidirectional Order Dependency Discovery 有效的双向顺序依赖发现

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00013

Yifeng Jin, Lin Zhu, Zijing Tan

引用次数: 11

Reasoning about the Future in Blockchain Databases 关于区块链数据库未来的推理

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00206

Sara Cohen, Adam Rosenthal, Aviv Zohar

{"title":"Reasoning about the Future in Blockchain Databases","authors":"Sara Cohen, Adam Rosenthal, Aviv Zohar","doi":"10.1109/ICDE48307.2020.00206","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00206","url":null,"abstract":"A key difference between using blockchains to store data and centrally controlled databases is that transactions are accepted to a blockchain via a consensus mechanism, and not by a controlling central party. Hence, once a user has issued a transaction, she cannot be certain if it will be accepted. Moreover, a yet unaccepted transaction cannot be retracted by the user, and may (or may not) be appended to the blockchain at any point in the future. This causes difficulties as the user may wish to formulate new transactions based on the knowledge of which previous transactions will be accepted. Yet this knowledge is inherently uncertain.We introduce a formal abstraction for blockchains as a data storage layer that underlies a database. The main issue that we tackle is the need to reason about possible worlds, due to the uncertainty in transaction appending. In particular, we consider the theoretical complexity of determining whether it is possible for a denial constraint to be contradicted, given the current state of the blockchain, pending transactions, and integrity constraints on blockchain data. We then present practical algorithms for this problem that work well in practice.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"42 1","pages":"1930-1933"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82272718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Task Deployment Recommendation with Worker Availability 具有工作者可用性的任务部署建议

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00175

Dong Wei, Senjuti Basu Roy, S. Amer-Yahia

引用次数: 2

Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach 通过物化采样立方体方法涡轮增压地理空间可视化仪表板

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00105

Jia Yu, Mohamed Sarwat

{"title":"Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach","authors":"Jia Yu, Mohamed Sarwat","doi":"10.1109/ICDE48307.2020.00105","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00105","url":null,"abstract":"In this paper, we present a middleware framework that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards. The proposed system adopts a sampling cube approach that stores pre-materialized spatial samples and allows users to define their own accuracy loss function such that the produced samples can be used for various user-defined visualization tasks. The system ensures that the difference between the sample fed into the visualization dashboard and the raw query answer never exceeds the user-specified loss threshold. To reduce the number of cells in the sampling cube and hence mitigate the initialization time and memory utilization, the system employs two main strategies: (1) a partially materialized cube to only materialize local samples of those queries for which the global sample (the sample drawn from the entire dataset) exceeds the required accuracy loss threshold. (2) a sample selection technique that finds similarities between different local samples and only persists a few representative samples. Based on the extensive experimental evaluation, Tabula can bring down the total data-to-visualization time (including both data-system and visualization times) of a heat map generated over 700 million taxi rides to 600 milliseconds with 250 meters user-defined accuracy loss. Besides, Tabula costs up to two orders of magnitude less memory footprint (e.g., only 800 MB for the running example) and one order of magnitude less initialization time than the fully materialized sampling cube.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"5 1","pages":"1165-1176"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72664573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0