{"title":"Crime Prediction With Missing Data Via Spatiotemporal Regularized Tensor Decomposition","authors":"Weichao Liang;Jie Cao;Lei Chen;Youquan Wang;Jia Wu;Amin Beheshti;Jiangnan Tang","doi":"10.1109/TBDATA.2023.3283098","DOIUrl":"10.1109/TBDATA.2023.3283098","url":null,"abstract":"The goal of crime prediction is to forecast the number of crime incidents at each region of a city based on the historical crime data. It has attracted a great deal of attention from both academic and industrial communities due to its considerable significance in improving urban safety and reducing financial losses. Although much progress has been made in this field, most of the existing approaches assume that the historical crime data are complete, which does not hold in many real-world scenarios. Meanwhile, crime incidents are affected by multiple factors and have intricate spatial, temporal, and categorical correlations, which are not fully utilized by the current methods. In this article, we propose a novel tensor decomposition based framework, named TD-Crime, to conduct prediction directly on the incomplete crime data. Specifically, we first organize the crime data as a tensor and then apply the nonnegative CP decomposition to it, which not only provides a natural solution to the missing data problem but also captures the spatial, temporal, and categorical correlations implicitly. Moreover, we attempt to exploit the spatial and temporal correlations explicitly by directly learning from the crime data to further improve the forecasting performance. Finally, we obtain a joint optimization problem and present an efficient alternating optimization scheme to find a satisfactory solution. Extensive experiments on the real-world crime datasets show that TD-Crime can address the crime prediction task effectively under different missing data scenarios.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1392-1407"},"PeriodicalIF":7.2,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42653830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection","authors":"Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen","doi":"10.1109/TBDATA.2023.3265509","DOIUrl":"10.1109/TBDATA.2023.3265509","url":null,"abstract":"Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1198-1209"},"PeriodicalIF":7.2,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41278104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanbei Liu;Shichuan Zhao;Xiao Wang;Lei Geng;Zhitao Xiao;Jerry Chun-Wei Lin
{"title":"Self-Consistent Graph Neural Networks for Semi-Supervised Node Classification","authors":"Yanbei Liu;Shichuan Zhao;Xiao Wang;Lei Geng;Zhitao Xiao;Jerry Chun-Wei Lin","doi":"10.1109/TBDATA.2023.3266590","DOIUrl":"10.1109/TBDATA.2023.3266590","url":null,"abstract":"Graph Neural Networks (GNNs), the powerful graph representation technique based on deep learning, have attracted great research interest in recent years. Although many GNNs have achieved the state-of-the-art accuracy on a set of standard benchmark datasets, they are still limited to traditional semi-supervised framework and lack of sufficient supervision information, especially for the large amount of unlabeled data. To overcome this issue, we propose a novel self-consistent graph neural networks (SCGNN) framework to enrich the supervision information from two aspects: the self-consistency of unlabeled data and the label information of labeled data. First, in order to extract the \u0000<styled-content>self-supervision information</styled-content>\u0000 from the numerous unlabeled nodes, we perform graph data augmentation and leverage a self-consistent constraint to maximize the mutual information of the unlabeled nodes across different augmented graph views. The self-consistency can sufficiently utilize the intrinsic structural attributes of the graph to extract the \u0000<styled-content>self-supervision information</styled-content>\u0000 from unlabeled data and improve the subsequent classification result. Second, to further extract supervision information from scarce labeled nodes, we introduce a fusion mechanism to obtain comprehensive node embeddings by fusing node representations of two positive graph views, and optimize the classification loss over labeled nodes to maximize the utilization of label information. We conduct comprehensive empirical studies on six public benchmark datasets in node classification task. In terms of accuracy, SCGNN improves by an average of 2.08% over the best baseline, and specifically by 5.8% on the Disease dataset.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1186-1197"},"PeriodicalIF":7.2,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49370204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Hao;Junchang Xin;Zhiqiong Wang;Zhongming Yao;Guoren Wang
{"title":"Efficient and Secure Data Sharing Scheme on Interoperable Blockchain Database","authors":"Kun Hao;Junchang Xin;Zhiqiong Wang;Zhongming Yao;Guoren Wang","doi":"10.1109/TBDATA.2023.3265178","DOIUrl":"10.1109/TBDATA.2023.3265178","url":null,"abstract":"Interoperable Blockchain Database (IBD) can enable users to execute transactions for sharing data stored in various blockchains maintained by different organizations or communities in a transparent manner. However, compared to traditional distributed databases, IBD can hardly provide high-level security and scalability, which are caused by many factors, such as system architecture, consensus protocol, and interactive pattern. Among them, the consensus protocol is the most critical factor, since the credibility of consensus nodes inside the corresponding blockchains are difficult to be guaranteed. Additionally, the consensus protocol directly affects the verification efficiency for given transactions in IBD. In this paper, we formally concern the problem of secure data sharing in IBD. We present a scheme named \u0000<italic>Hybridchain</i>\u0000 to execute transactions for sharing data securely and efficiently. We first propose a novel concept named \u0000<italic>Interoperable Consensus Group</i>\u0000 (ICG) which organizes a set of basic consensus nodes into a group, each of which is responsible for managing at least one local blockchain. Then, we present an interoperable cross-chains consensus protocol to achieve eventual consistency of blockchain transactions. We conduct extensive experiments, and the evaluation results show that our proposed approach achieves superior performance.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1171-1185"},"PeriodicalIF":7.2,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41675603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Supervised Federated Adaptation for Multi-Site Brain Disease Diagnosis","authors":"Qiming Yang;Qi Zhu;Mingming Wang;Wei Shao;Zheng Zhang;Daoqiang Zhang","doi":"10.1109/TBDATA.2023.3264109","DOIUrl":"10.1109/TBDATA.2023.3264109","url":null,"abstract":"The multi-site approach has attracted increasing attention in brain disease diagnosis, because it can improve the prediction performance by integrating sample information from different medical institutions. However, its training procedure requires the transmission of subject's original images or features among sites, which may cause privacy disclosure. In this article, we propose a self-supervised federated adaptation (S2FA) framework for robust multi-site prediction, which can reduce the risk of privacy disclosure. As far as we know, it is the first work to investigate the cross-site brain disease diagnosis, which trains model on source sites and tests on target site, often occurring in clinical practice. First, we implement a decentralized federated optimization strategy, by which each site communicates model parameters periodically. Second, we construct an auxiliary self-supervised model for target site through transferring knowledge from source sites with self-paced learning. Then, a hash mapping is proposed to encode the target feature, simultaneously reducing the risk of privacy information disclosure and alleviating data heterogeneity among sites. Finally, we achieve the cross-site prediction by weighted federated source model and auxiliary target model. Experimental results on multi-site datasets show that the proposed S2FA can accurately identify brain disease. Our codes are available at \u0000<uri>https://github.com/nuaayqm/S2FA</uri>\u0000.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1334-1346"},"PeriodicalIF":7.2,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49220442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality Inference in Federated Learning With Secure Aggregation","authors":"Balázs Pejó;Gergely Biczók","doi":"10.1109/TBDATA.2023.3280406","DOIUrl":"10.1109/TBDATA.2023.3280406","url":null,"abstract":"Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality (i.e., the ratio of correct labels) of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to stabilize training performance, measure the individual contribution of participants, and detect misbehavior.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1430-1437"},"PeriodicalIF":7.2,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6687317/10236926/10138056.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45853111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy and Efficiency of Communications in Federated Split Learning","authors":"Zongshun Zhang;Andrea Pinto;Valeria Turina;Flavio Esposito;Ibrahim Matta","doi":"10.1109/TBDATA.2023.3280405","DOIUrl":"10.1109/TBDATA.2023.3280405","url":null,"abstract":"Every day, large amounts of sensitive data are distributed across mobile phones, wearable devices, and other sensors. Traditionally, these enormous datasets have been processed on a single system, with complex models being trained to make valuable predictions. Distributed machine learning techniques such as Federated and Split Learning have recently been developed to protect user data and privacy better while ensuring high performance. Both of these distributed learning architectures have advantages and disadvantages. In this article, we examine these tradeoffs and suggest a new hybrid Federated Split Learning architecture that combines the efficiency and privacy benefits of both. Our evaluation demonstrates how our hybrid Federated Split Learning approach can lower the amount of processing power required by each client running a distributed learning system, and reduce training and inference time while keeping a similar accuracy. We also discuss the resiliency of our approach to deep learning privacy inference attacks and compare our solution to other recently proposed benchmarks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1380-1391"},"PeriodicalIF":7.2,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42482111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning","authors":"Shuai Wang;Shijie Hu;Baoshen Guo;Guang Wang","doi":"10.1109/TBDATA.2023.3262408","DOIUrl":"10.1109/TBDATA.2023.3262408","url":null,"abstract":"On-demand delivery has become prevailing for people to order meals and groceries online, especially during the pandemic. It is essential to dispatch massive orders to limited couriers to satisfy on-demand delivery users, especially during peak hours. Existing studies mainly focus on order dispatching within a region, and they are challenging to be applied to the cross-region courier displacement problem due to (1) unique practical factors, including regional spatial-temporal demand-supply dynamics and strict delivery time constraints, and (2) the large-scale setting and high-dimensional decision space given massive couriers in on-demand delivery. To address these challenges, in this work, we propose an efficient cross-region courier displacement framework, i.e., \u0000<underline>C</u>\u0000ourier \u0000<underline>D</u>\u0000isplacement \u0000<underline>R</u>\u0000einforcement \u0000<underline>L</u>\u0000earning (short for \u0000<italic>CDRL</i>\u0000) based on centralized multi-agent actor-critic, which first design the actor-critic network with a time-varying displacement intensity control module to capture demand-supply dynamics and utilize the centralized training and decentralized execution multi-agent framework to address the large-scale coordination. One-month real-world order records collected from one of the biggest on-demand delivery services in the world are utilized to show the performance of our design. The extensive results show that our method offers a 47.97% of increase in balancing supply and demand and reduces idle ride time by 24.62% simultaneously.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1321-1333"},"PeriodicalIF":7.2,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42456979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Mathematical Optimization in Data Visualization and Visual Analytics: A Survey","authors":"Guodao Sun;Zihao Zhu;Gefei Zhang;Chaoqing Xu;Yunchao Wang;Sujia Zhu;Baofeng Chang;Ronghua Liang","doi":"10.1109/TBDATA.2023.3262151","DOIUrl":"10.1109/TBDATA.2023.3262151","url":null,"abstract":"Mathematical optimization is the process of determining the set of globally or locally optimal parameters in a finite or infinite search space. It has been extensively employed in the research areas of computer science, engineering, operations research, and economics. The application of mathematical optimization has also been extended to data visualization, where it can enhance data processing, structure visualization, and facilitate exploration. However, the current state of summarization in the application of mathematical optimization in data visualization remains inadequate. In this article, we review and classify the existing techniques for advanced mathematical optimization in the fields of data visualization and visual analytics. The classification is conducted based on a classical visualization pipeline, including data enhancement and transformation, representation and rendering, as well as interactive exploration and analysis. We also discuss various mathematical optimization models and their solution methods to help readers gain a better understanding of the relationship among models, visualization, and application scenarios. We additionally provide an online exploration demo, which could enable users to interactively find relevant articles. Based on the limitations and potential trends revealed in the existing literature, we define future challenges in the cross-disciplinary of mathematical optimization and data visualization.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1018-1037"},"PeriodicalIF":7.2,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49608202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zijian Liu;Yang Luo;Xitong Pu;Geyong Min;Chunbo Luo
{"title":"A Multi-Modal Hypergraph Neural Network via Parametric Filtering and Feature Sampling","authors":"Zijian Liu;Yang Luo;Xitong Pu;Geyong Min;Chunbo Luo","doi":"10.1109/TBDATA.2023.3278988","DOIUrl":"10.1109/TBDATA.2023.3278988","url":null,"abstract":"In the real world, relationships between objects are often complex, involving multiple variables and modes. Hypergraph neural networks possess the capability to capture and represent such intricate relationships by deriving and inheriting their graph-based counterparts. Nevertheless, both graph and hypergraph neural networks suffer from the problem of over-smoothing when multiple graph convolution layers are stacked. To address this issue, this article introduces the Multi-modal Hypergraph Neural Network with Parametric Filtering and Feature Sampling (MHNet) to encode complex hypergraph features and mitigate over-smoothing. The proposed approach uses hypergraph structures to model high-order and multi-modal data correlations, a polynomial hypergraph filter to dynamically extract multi-scale node features through parametric polynomial fitting, and a feature sampling strategy to learn from sparse and labeled samples while avoiding overfitting. Experimental results on four hypergraph datasets and two multi-modal visual datasets demonstrate that the proposed MHNet outperforms state-of-the-art algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1365-1379"},"PeriodicalIF":7.2,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47901372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}