Rui Song;Bin Xiao;Yubo Song;Songtao Guo;Yuanyuan Yang
{"title":"A Survey of Blockchain-Based Schemes for Data Sharing and Exchange","authors":"Rui Song;Bin Xiao;Yubo Song;Songtao Guo;Yuanyuan Yang","doi":"10.1109/TBDATA.2023.3293279","DOIUrl":"10.1109/TBDATA.2023.3293279","url":null,"abstract":"Data immutability, transparency and decentralization of blockchain make it widely used in various fields, such as Internet of things, finance, energy and healthcare. With the advent of the Big Data era, various companies and organizations urgently need data from other parties for data analysis and mining to provide better services. Therefore, data sharing and data exchange have become an enormous industry. Traditional centralized data platforms face many problems, such as privacy leakage, high transaction costs and lack of interoperability. Introducing blockchain into this field can address these problems, while providing decentralized data storage and exchange, access control, identity authentication and copyright protection. Although many impressive blockchain-based schemes for data sharing or data exchange scenarios have been presented in recent years, there is still a lack of review and summary of work in this area. In this paper, we conduct a detailed survey of blockchain-based data sharing and data exchange platforms, discussing the latest technical architectures and research results in this field. In particular, we first survey the current blockchain-based data sharing solutions and provide a detailed analysis of system architecture, access control, interoperability, and security. We then review blockchain-based data exchange systems and data marketplaces, discussing trading process, monetization, copyright protection and other related topics.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1477-1495"},"PeriodicalIF":7.2,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qian Li;Shu Guo;Jia Wu;Jianxin Li;Jiawei Sheng;Hao Peng;Lihong Wang
{"title":"Event Extraction by Associating Event Types and Argument Roles","authors":"Qian Li;Shu Guo;Jia Wu;Jianxin Li;Jiawei Sheng;Hao Peng;Lihong Wang","doi":"10.1109/TBDATA.2023.3291563","DOIUrl":"10.1109/TBDATA.2023.3291563","url":null,"abstract":"Event extraction (EE), which acquires structural event knowledge from texts, can be divided into two sub-tasks: event type classification and element extraction (namely identifying triggers and arguments under different role patterns). As different event types always own distinct extraction schemas (i.e., role patterns), previous work on EE usually follows an isolated learning paradigm, performing element extraction independently for different event types. It ignores meaningful associations among event types and argument roles, leading to relatively poor performance for less frequent types/roles. This paper proposes a novel neural association framework for the EE task. Given a document, it first performs type classification via constructing a document-level event graph to associate sentence nodes of different types and adopting a document-awared graph attention network to learn sentence embeddings. Then, element extraction is achieved by building a new schema of argument roles, with a type-awared parameter inheritance mechanism to enhance role preference for extracted elements. As such, our model takes into account type and role associations during EE, enabling implicit information sharing among them. Experimental results show that our approach consistently outperforms most state-of-the-art EE methods in both sub-tasks, especially at least 2.51% and 1.12% improvement of the event trigger identification and argument role classification sub-tasks. Particularly, for types/roles with less training data, the performance is superior to the existing methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1549-1560"},"PeriodicalIF":7.2,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88360714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ha Xuan Tran;Thuc Duy Le;Jiuyong Li;Lin Liu;Jixue Liu;Yanchang Zhao;Tony Waters
{"title":"Personalized Interventions to Increase the Employment Success of People With Disability","authors":"Ha Xuan Tran;Thuc Duy Le;Jiuyong Li;Lin Liu;Jixue Liu;Yanchang Zhao;Tony Waters","doi":"10.1109/TBDATA.2023.3291547","DOIUrl":"10.1109/TBDATA.2023.3291547","url":null,"abstract":"An emerging problem in Disability Employment Services (DES) is recommending to people with disability the right skill to upgrade and the right upgrade level to achieve maximum improvement in their employment success. This problem requires causal reasoning to estimate the individual causal effect of possible factors on the outcome to determine the most effective intervention. In this paper, we propose a causal graph based framework to solve the intervention recommendation problem for survival outcome (job retention time) and non-survival outcome (employment status). For an individual, a personalized causal graph is predicted for them. It indicates which factors affect the outcome and their causal effects at different intervention levels. Based on the causal graph, we can determine the most effective intervention for an individual, i.e., the one that can generate a maximum outcome increase. Experiments with two case studies show that our framework can help people with disability increase their employment success. Evaluations with public datasets also show the advantage of our framework in other applications.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1561-1574"},"PeriodicalIF":7.2,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Visual Affordance Recognition Based on Deep Learning","authors":"Dongpan Chen;Dehui Kong;Jinghua Li;Shaofan Wang;Baocai Yin","doi":"10.1109/TBDATA.2023.3291558","DOIUrl":"10.1109/TBDATA.2023.3291558","url":null,"abstract":"Visual affordance recognition is an important research topic in robotics, human-computer interaction, and other computer vision tasks. In recent years, deep learning-based affordance recognition methods have achieved remarkable performance. However, there is no unified and intensive survey of these methods up to now. Therefore, this article reviews and investigates existing deep learning-based affordance recognition methods from a comprehensive perspective, hoping to pursue greater acceleration in this research domain. Specifically, this article first classifies affordance recognition into five tasks, delves into the methodologies of each task, and explores their rationales and essential relations. Second, several representative affordance recognition datasets are investigated carefully. Third, based on these datasets, this article provides a comprehensive performance comparison and analysis of the current affordance recognition methods, reporting the results of different methods on the same datasets and the results of each method on different datasets. Finally, this article summarizes the progress of affordance recognition, outlines the existing difficulties and provides corresponding solutions, and discusses its future application trends.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1458-1476"},"PeriodicalIF":7.2,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate Time-Series Forecasting Model: Predictability Analysis and Empirical Study","authors":"Qinpei Zhao;Guangda Yang;Kai Zhao;Jiaming Yin;Weixiong Rao;Lei Chen","doi":"10.1109/TBDATA.2023.3288693","DOIUrl":"10.1109/TBDATA.2023.3288693","url":null,"abstract":"Multivariate time series forecasting has wide applications such as traffic flow prediction, supermarket commodity demand forecasting and etc., and a large number of forecasting models have been developed. Given these models, a natural question has been raised: what theoretical limits of forecasting accuracy can these models achieve? Recent works of urban human mobility prediction have made progress on the maximum predictability that any algorithm can achieve. However, existing approaches on maximum predictability on the multivariate time series fully ignore the interrelationship between multiple variables. In this article, we propose a methodology to measure the upper limit of predictability for multivariate time series with multivariate constraint relations. The key of the proposed methodology is a novel entropy, named Multivariate Constraint Sample Entropy (\u0000<italic>McSE</i>\u0000), to incorporate the multivariate constraint relations for better predictability. We conduct a systematic evaluation over eight datasets and compare existing methods with our proposed predictability and find that we get a higher predictability. We also find that the forecasting algorithms that capture the multivariate constraint relation information, such as GNN, can achieve higher accuracy, confirming the importance of multivariate constraint relations for predictability.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1536-1548"},"PeriodicalIF":7.2,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Privacy-Aware Causal Structure Learning in Federated Setting","authors":"Jianli Huang;Xianjie Guo;Kui Yu;Fuyuan Cao;Jiye Liang","doi":"10.1109/TBDATA.2023.3285477","DOIUrl":"10.1109/TBDATA.2023.3285477","url":null,"abstract":"Causal structure learning has been extensively studied and widely used in machine learning and various applications. To achieve an ideal performance, existing causal structure learning algorithms often need to centralize a large amount of data from multiple data sources. However, in the privacy-preserving setting, it is impossible to centralize data from all sources and put them together as a single dataset. To preserve data privacy, federated learning as a new learning paradigm has attached much attention in machine learning in recent years. In this paper, we study a privacy-aware causal structure learning problem in the federated setting and propose a novel federated PC (FedPC) algorithm with two new strategies for preserving data privacy without centralizing data. Specifically, we first propose a novel layer-wise aggregation strategy for a seamless adaptation of the PC algorithm into the federated learning paradigm for federated skeleton learning, then we design an effective strategy for learning consistent separation sets for federated edge orientation. The extensive experiments validate that FedPC is effective for causal structure learning in federated learning setting.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1525-1535"},"PeriodicalIF":7.2,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77771046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RGSE: Robust Graph Structure Embedding for Anomalous Link Detection","authors":"Zhen Liu;Wenbo Zuo;Dongning Zhang;Xiaodong Feng","doi":"10.1109/TBDATA.2023.3284270","DOIUrl":"10.1109/TBDATA.2023.3284270","url":null,"abstract":"Anomalous links such as noisy links or adversarial edges widely exist in real-world networks, which may undermine the credibility of the network study, e.g., community detection in social networks. Therefore, anomalous links need to be removed from the polluted network by a detector. Due to the co-existence of normal links and anomalous links, how to identify anomalous links in a polluted network is a challenging issue. By designing a robust graph structure embedding framework, also called RGSE, the link-level feature representations that are generated from both global embedding view and local stable view can be used for anomalous link detection on contaminated graphs. Comparison experiments on a variety of datasets demonstrate that the new model and its variants achieve up to an average 5.2% improvement with respect to the accuracy of anomalous link detection against the traditional graph representation models. Further analyses also provide interpretable evidence to support the model's superiority.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1420-1429"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46406222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuzhu Wang;Cui Hu;Bin Xiao;Yang Liu;Teng Li;Zhuo Ma;Jianfeng Ma
{"title":"Outsourced Privacy-Preserving Data Alignment on Vertically Partitioned Database","authors":"Zhuzhu Wang;Cui Hu;Bin Xiao;Yang Liu;Teng Li;Zhuo Ma;Jianfeng Ma","doi":"10.1109/TBDATA.2023.3284271","DOIUrl":"10.1109/TBDATA.2023.3284271","url":null,"abstract":"In the context of real-world secure outsourced computations, private data alignment has been always the essential preprocessing step. However, current private data alignment schemes, mainly circuit-based, suffer from high communication overhead and often need to transfer potentially gigabytes of data. In this paper, we propose a lightweight private data alignment protocol (called SC-PSI) that can overcome the bottleneck of communication. Specifically, SC-PSI involves four phases of computations, including data preprocessing, data outsourcing, private set member (PSM) evaluation and circuit computation (CC). Like prior works, the major overhead of SC-PSI mainly lies in the latter two phases. The improvement is SC-PSI utilizes the function secret sharing technique to develop the PSM protocol, which avoids the multiple rounds of communication to compute intersection set members. Moreover, benefited from our specially designed PSM protocol, SC-PSI does not to execute complex secure comparison circuits in the CC phase. Experimentally, we validate that compared to prior works, SC-PSI can save around 61.39% running time and 89.61% communication overhead.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1408-1419"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44411408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards an Energy Complexity Model for Distributed Data Processing Algorithms","authors":"Jie Song;Xingchen Zhao;Chaopeng Guo;Yu Gu;Ge Yu","doi":"10.1109/TBDATA.2023.3284259","DOIUrl":"10.1109/TBDATA.2023.3284259","url":null,"abstract":"Modern data centers exist as infrastructure in the era of Big Data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers’ operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing Big Data, should have a ‘small’ energy complexity. We define \u0000<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\u0000 to represent the functional relationship that associates an algorithm's input size \u0000<inline-formula><tex-math>$n$</tex-math></inline-formula>\u0000 with its notional energy consumption \u0000<inline-formula><tex-math>$E$</tex-math></inline-formula>\u0000. Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete \u0000<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\u0000 solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1510-1524"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Yang;Yaoru Sun;Maoyu Mao;Lizhi Bai;Siyu Zhang;Fang Wang
{"title":"Model-Agnostic Method: Exposing Deepfake Using Pixel-Wise Spatial and Temporal Fingerprints","authors":"Jun Yang;Yaoru Sun;Maoyu Mao;Lizhi Bai;Siyu Zhang;Fang Wang","doi":"10.1109/TBDATA.2023.3284272","DOIUrl":"10.1109/TBDATA.2023.3284272","url":null,"abstract":"Deepfake poses a serious threat to the reliability of judicial evidence and intellectual property protection. Existing detection methods either blindly utilize deep learning or use biosignal features, but neither considers spatial and temporal relevance of face features. These methods are increasingly unable to resist the growing realism of fake videos and lack generalization. In this paper, we identify a reliable fingerprint through the consistency of AR coefficients and extend the original PPG signal to 3-dimensional fingerprints to effectively detect fake content. Using these reliable fingerprints, we propose a novel model-agnostic method to expose Deepfake by analyzing temporal and spatial faint synthetic signals hidden in portrait videos. Specifically, our method extracts two types of faint information, i.e., PPG features and AR features, which are used as the basis for forensics in temporal and spatial domains, respectively. PPG allows remote estimation of the heart rate in face videos, and irregular heart rate fluctuations expose traces of tampering. AR coefficients reflect pixel-wise correlation and spatial traces of smoothing caused by up-sampling in the process of generating fake faces. Furthermore, we employ two ACBlock-based DenseNets as classifiers. Our method provides state-of-the-art performance on multiple deep forgery datasets and demonstrates better generalization.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1496-1509"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}