Boxuan Li, Reynold Cheng, Jiafeng Hu, Yixiang Fang, Min Ou, Ruibang Luo, K. Chang, Xuemin Lin
{"title":"MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks","authors":"Boxuan Li, Reynold Cheng, Jiafeng Hu, Yixiang Fang, Min Ou, Ruibang Luo, K. Chang, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00154","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00154","url":null,"abstract":"Large networks with labeled nodes are prevalent in various applications, such as biological graphs, social networks, and e-commerce graphs. To extract insight from this rich information source, we propose MC-Explorer, which is an advanced analysis and visualization system. A highlight of MC-Explorer is its ability to discover motif-cliques from a graph with labeled nodes. A motif, such as a 3-node triangle, is a fundamental building block of a graph. A motif-clique is a \"complete\" subgraph in a network with respect to a desired higher-order connection pattern. For example, on a large biological graph, we found out some motif-cliques, which disclose new side effects of a drug, and potential drugs for healing diseases. MC-Explorer includes online and interactive facilities for exploring a large labeled network through the use of motif-cliques. We will demonstrate how MC-Explorer can facilitate the analysis and visualization of a labeled biological network.An online demo video of MC-Explorer can be accessed from https://www.dropbox.com/s/vkalumc28wqp8yl/demo.mov","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"15 1","pages":"1722-1725"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87407339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiantian Liu, Zijin Feng, Huan Li, Hua Lu, M. A. Cheema, Hong Cheng, Jianliang Xu
{"title":"Shortest Path Queries for Indoor Venues with Temporal Variations","authors":"Tiantian Liu, Zijin Feng, Huan Li, Hua Lu, M. A. Cheema, Hong Cheng, Jianliang Xu","doi":"10.1109/ICDE48307.2020.00227","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00227","url":null,"abstract":"Indoor shortest path query (ISPQ) is of fundamental importance for indoor location-based services (LBS). However, existing ISPQs ignore indoor temporal variations, e.g., the open and close times associated with entities like doors and rooms. In this paper, we define a new type of query called Indoor Temporal-variation aware Shortest Path Query (ITSPQ). It returns the valid shortest path based on the up-to-date indoor topology at the query time. A set of techniques is designed to answer ITSPQ efficiently. We design a graph structure (IT-Graph) that captures indoor temporal variations. To process ITSPQ using IT-Graph, we design two algorithms that check a door’s accessibility synchronously and asynchronously, respectively. We experimentally evaluate the proposed techniques using synthetic data. The results show that our methods are efficient.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"2014-2017"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80092897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advances in Cryptography and Secure Hardware for Data Outsourcing","authors":"Shantanu Sharma, A. Burtsev, S. Mehrotra","doi":"10.1109/ICDE48307.2020.00173","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00173","url":null,"abstract":"Despite extensive research, secure outsourcing remains an open challenge. This tutorial focuses on recent advances in secure cloud-based data outsourcing based on cryptographic (encryption, secret-sharing, and multi-party computation (MPC)) and hardware-based approaches. We highlight the strengths and weaknesses of state-of-the-art techniques, and conclude that, while no single approach is likely to emerge as a silver bullet. Thus, the key is to merge different hardware and software techniques to work in conjunction using partitioned computing wherein a computation is split across different cryptographic techniques carefully, so as not to compromise security. We highlight some recent work in that direction.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"1798-1801"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88493407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Fent, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, A. Kemper
{"title":"Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory","authors":"Philipp Fent, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, A. Kemper","doi":"10.1109/ICDE48307.2020.00131","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00131","url":null,"abstract":"While hardware and software improvements greatly accelerated modern database systems’ internal operations, the decades-old stream-based Socket API for external communication is still unchanged. We show experimentally, that for modern high-performance systems networking has become a performance bottleneck. Therefore, we argue that the communication stack needs to be redesigned to fully exploit modern hardware—as has already happened to most other database system components.We propose L5, a high-performance communication layer for database systems. L5 rethinks the flow of data in and out of the database system and is based on direct memory access techniques for intra-datacenter (RDMA) and intra-machine communication (Shared Memory). With L5, we provide a building block to accelerate ODBC-like interfaces with a unified and message-based communication framework. Our results show that using interconnects like RDMA (InfiniBand), RoCE (Ethernet), and Shared Memory (IPC), L5 can largely eliminate the network bottleneck for database systems.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1477-1488"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88795461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches (Extended Abstract)","authors":"Petr Lukáš, Radim Bača, M. Krátký, T. Ling","doi":"10.1109/ICDE48307.2020.00234","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00234","url":null,"abstract":"Structural XQuery and XPath queries are often modeled by twig pattern queries (TPQs) specifying predicates on XML nodes and structural relationships to be satisfied between them. This paper considers a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery and XPath semantics. There are two types of TPQ processing approaches: binary joins and holistic joins. The binary joins utilize a query plan of interconnected binary operators, whereas the holistic joins are based on one complex operator to process the whole query. In the recent years, the holistic joins have been considered as the state-of-the-art TPQ processing method. However, a thorough analytical and experimental comparison of binary and holistic joins has been missing despite an enormous research effort in this area. In this paper, we try to fill this gap. We introduce several improvements of the binary join operators which enable us to build a so-called fully-pipelined (FP) query plan for any TPQ with the specification of output and non-output query nodes. We analytically show that, for a class of queries, the proposed approach has the same time and space complexity as holistic joins, and we experimentally demonstrate that the proposed approach outperforms holistic joins in many cases.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"124 1","pages":"2030-2031"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77261566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Eamonn J. Keogh
{"title":"Matrix Profile XVII: Indexing the Matrix Profile to Allow Arbitrary Range Queries","authors":"Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Eamonn J. Keogh","doi":"10.1109/ICDE48307.2020.00185","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00185","url":null,"abstract":"Since its introduction several years ago, the Matrix Profile has received significant attention for two reasons. First, it is a very general representation, allowing for the discovery of time series motifs, discords, chains, joins, shapelets, segmentations etc. Secondly, it can be computed very efficiently, allowing for fast exact computation and ultra-fast approximate computation. For analysts that use the Matrix Profile frequently, its incremental computability means that they can perform ad-hoc analytics at any time, with almost no delay time. However, they can only issue global queries. That is, queries that consider all the data from time zero to the current time. This is a significant limitation, as they may be interested in localized questions about a contiguous subset of the data. For example, \"do we have any unusual motifs that correspond with that unusually cool summer two years ago\". Such ad-hoc queries would require recomputing the Matrix Profile for the time period in question. This is not an untenable computation, but it could not be done in interactive time. In this work we introduce a novel indexing framework that allows queries about arbitrary ranges to be answered in quasilinear time, allowing such queries to be interactive for the first time.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"7 1","pages":"1846-1849"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87082281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jong-Seop Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, Yongjun Park
{"title":"Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks","authors":"Jong-Seop Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, Yongjun Park","doi":"10.1109/ICDE48307.2020.00085","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00085","url":null,"abstract":"Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuS- PARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks.To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B- Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"122 1","pages":"925-936"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87691473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCLPD: Smart Cargo Loading Plan Decision Framework","authors":"Jiaye Liu, Jiali Mao, Jiajun Liao, Huiqi Hu, Ye Guo, Aoying Zhou","doi":"10.1109/ICDE48307.2020.00163","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00163","url":null,"abstract":"The rapid development of steel logistics industry still has not effectively address such issues as truck overload and order overdue as well as cargo overstock. One of the reasons lie in limited number of trucks for transporting large scale cargos. More importantly, traditional methods attend to distribute cargos to trucks with the aim of maximizing the loading of each truck. But they ignore the priority level of orders and the expiration date of cargos stored in the warehouses, which have critical influences on profits of steel logistics industry. Hence, it necessitates an appropriate cargo distribution mechanism under the precondition of limited transportation capacity resources, to guarantee the maximization of delivery proportion for high-priority cargos. Recently, tremendous logistics data has been produced and are being in constant increment hourly in steel logistics platform. However, there is no existing solution to transform such data into actionable scheme to improve cargo distributing effectiveness. This paper puts forward a system implementation of smart cargo loading plan decision framework (SCLPD for short) for steel logistics industry. Through analysis on numerous real data cargo loading plan and inventory of warehouse, some important rules related to cargo distribution process are extracted. Additionally, consider that different amounts of trucks arriving in different time periods, based on adaptive time window model, a two- layer searching mechanism consisting of a genetic algorithm and A* algorithm is designed to ensure global optimization of cargo loading plan for the trucks in all time periods. In our demonstration, we illustrate the procedure of matching for cargos and trucks in various time windows, and showcase the comparison experimental results between the traditional method and SCLPD by the measurement of delivery proportion for high- priority cargos. The effectiveness and practicality of SCLPD enables efficient cargo loading plan generation, to meet the real- world requirements from steel logistics platform.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1758-1761"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87767058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zijian Li, Wenhao Zheng, Xueling Lin, Ziyuan Zhao, Zhe Wang, Yue Wang, Xun Jian, Lei Chen, Qiang Yan, Tiezheng Mao
{"title":"TransN: Heterogeneous Network Representation Learning by Translating Node Embeddings","authors":"Zijian Li, Wenhao Zheng, Xueling Lin, Ziyuan Zhao, Zhe Wang, Yue Wang, Xun Jian, Lei Chen, Qiang Yan, Tiezheng Mao","doi":"10.1109/ICDE48307.2020.00057","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00057","url":null,"abstract":"Learning network embeddings has attracted growing attention in recent years. However, most of the existing methods focus on homogeneous networks, which cannot capture the important type information in heterogeneous networks. To address this problem, in this paper, we propose TransN, a novel multi-view network embedding framework for heterogeneous networks. Compared with the existing methods, TransN is an unsupervised framework which does not require node labels or user-specified meta-paths as inputs. In addition, TransN is capable of handling more general types of heterogeneous networks than the previous works. Specifically, in our framework TransN, we propose a novel algorithm to capture the proximity information inside each single view. Moreover, to transfer the learned information across views, we propose an algorithm to translate the node embeddings between different views based on the dual-learning mechanism, which can both capture the complex relations between node embeddings in different views, and preserve the proximity information inside each view during the translation. We conduct extensive experiments on real-world heterogeneous networks, whose results demonstrate that the node embeddings generated by TransN outperform those of competitors in various network mining tasks.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"65 1","pages":"589-600"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86465642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TIDY: Publishing a Time Interval Dataset with Differential Privacy (Extended abstract)","authors":"Woohwan Jung, Suyong Kwon, Kyuseok Shim","doi":"10.1109/ICDE48307.2020.00229","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00229","url":null,"abstract":"Log data from mobile devices usually contain a series of events with time intervals. However, the problem of releasing differentially private time interval data has not been tackled yet. We propose the TIDY (publishing Time Intervals via Differential privacY) algorithm to release time interval data under differential privacy. We use the frequency vectors as a compact representation of the time interval data to reduce the aggregated noise. We also develop a new partitioning method adapted for the frequency vectors to balance the trade-off between the noise and structural errors. Our experiments confirm that TIDY outperforms the existing algorithms for releasing 2D histograms.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"06 1","pages":"2020-2021"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85974856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}