{"title":"Online Dynamic Hybrid Broad Learning System for Real-Time Safety Assessment of Dynamic Systems","authors":"Zeyi Liu;Xiao He","doi":"10.1109/TKDE.2024.3475028","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3475028","url":null,"abstract":"Real-time safety assessment of dynamic systems is of paramount importance in industrial processes since it provides continuous monitoring and evaluation to prevent potential harm to the environment and individuals. However, there are still several challenges to be resolved due to the requirements of time consumption and the non-stationary nature of real-world environments. In this paper, a novel online dynamic hybrid broad learning system, termed ODH-BLS, is proposed to more fully utilize the co-design advantages of active adaptation and passive adaptation. It makes effective use of limited annotations with the proposed sample value function. Simultaneously, anchor points can be dynamically adjusted to accommodate changes of the underlying distribution, thereby leveraging the value of unlabeled samples. An iterative update rule is also derived to ensure adaptation of the assessment model to real-time data at low computational costs. We also provide theoretical analyses to illustrate its practicality. Several experiments regarding the JiaoLong deep-sea manned submersible are carried out. The results demonstrate that the proposed ODH-BLS method achieves a performance improvement of approximately 8% over the baseline method on the benchmark dataset, showing its effectiveness in solving real-time safety assessment tasks for dynamic systems.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8928-8938"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu
{"title":"SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and Its Retrieval","authors":"Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu","doi":"10.1109/TKDE.2024.3436883","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3436883","url":null,"abstract":"Giant pre-trained code models (PCMs) start coming into the developers’ daily practices. Understanding the type and amount of software knowledge in PCMs is essential for integrating PCMs into software engineering (SE) tasks and unlocking their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs’ Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs’ data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context learning design factors to identify the best configuration for practical use. With this best configuration, we investigate the impact of example prompts and FQN data properties on CoPilot's FQN inference capability. Our results confirm that CoPilot stores diverse FQN knowledge and can be applied for FQN inference due to its high accuracy and non-reliance on code analysis. Additionally, our extended study shows that the in-context learning method can be generalized to retrieve other SE factual knowledge embedded in giant PCMs. Furthermore, we find that the advanced general model GPT-4 also stores substantial SE knowledge. Comparing FQN inference between CoPilot and GPT-4, we observe that as model capabilities improve, the same prompts yield better results. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9220-9234"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In Search of a Memory-Efficient Framework for Online Cardinality Estimation","authors":"Xun Song;Jiaqi Zheng;Hao Qian;Shiju Zhao;Hongxuan Zhang;Xuntao Pan;Guihai Chen","doi":"10.1109/TKDE.2024.3486571","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3486571","url":null,"abstract":"Estimating per-flow cardinality from high-speed data streams has many applications such as anomaly detection and resource allocation. Yet despite tracking single flow cardinality with approximation algorithms offered, there remain algorithmical challenges for monitoring multi-flows especially under unbalanced cardinality distribution: existing methods adopt a uniform sketch layout and incur a large memory footprint to achieve high accuracy. Furthermore, they are hard to implement in the compact hardware used for line-rate processing. In this paper, we propose Couper, a memory-efficient measurement framework that can estimate cardinality for multi-flows under unbalanced cardinality distribution. We propose a two-layer structure based on a classic coupon collector's principle, where numerous mice flows are confined to the first layer and only the potential elephant flows are allowed to enter the second layer. Our two-layer structure can better fit the unbalanced cardinality distribution in practice and achieve much higher memory efficiency. We implement Couper in both software and hardware. Extensive evaluation under real-world and synthetic data traces show more than 20× improvements in terms of memory-efficiency compared to state-of-the-art.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"392-407"},"PeriodicalIF":8.9,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditi Gupta;Adeiza James Onumanyi;Satyadev Ahlawat;Yamuna Prasad;Virendra Singh
{"title":"B-CAVE: A Robust Online Time Series Change Point Detection Algorithm Based on the Between-Class Average and Variance Evaluation Approach","authors":"Aditi Gupta;Adeiza James Onumanyi;Satyadev Ahlawat;Yamuna Prasad;Virendra Singh","doi":"10.1109/TKDE.2024.3492339","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3492339","url":null,"abstract":"Change point detection (CPD) is a valuable technique in time series (TS) analysis, which allows for the automatic detection of abrupt variations within the TS. It is often useful in applications such as fault, anomaly, and intrusion detection systems. However, the inherent unpredictability and fluctuations in many real-time data sources pose a challenge for existing contemporary CPD techniques, leading to inconsistent performance across diverse real-time TS with varying characteristics. To address this challenge, we have developed a novel and robust online CPD algorithm constructed from the principle of discriminant analysis and based upon a newly proposed between-class average and variance evaluation approach, termed B-CAVE. Our B-CAVE algorithm features a unique change point measure, which has only one tunable parameter (i.e. the window size) in its computational process. We have also proposed a new evaluation metric that integrates time delay and the false alarm error towards effectively comparing the performance of different CPD methods in the literature. To validate the effectiveness of our method, we conducted experiments using both synthetic and real datasets, demonstrating the superior performance of the B-CAVE algorithm over other prominent existing techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"75-88"},"PeriodicalIF":8.9,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruidong Wang;Liang Xi;Fengbin Zhang;Haoyi Fan;Xu Yu;Lei Liu;Shui Yu;Victor C. M. Leung
{"title":"Context Correlation Discrepancy Analysis for Graph Anomaly Detection","authors":"Ruidong Wang;Liang Xi;Fengbin Zhang;Haoyi Fan;Xu Yu;Lei Liu;Shui Yu;Victor C. M. Leung","doi":"10.1109/TKDE.2024.3488375","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3488375","url":null,"abstract":"In unsupervised graph anomaly detection, existing methods usually focus on detecting outliers by learning local context information of nodes, while often ignoring the importance of global context. However, global context information can provide more comprehensive relationship information between nodes in the network. By considering the structure of the entire network, detection methods are able to identify potential dependencies and interaction patterns between nodes, which is crucial for anomaly detection. Therefore, we propose an innovative graph anomaly detection framework, termed CoCo (Context Correlation Discrepancy Analysis), which detects anomalies by meticulously evaluating variances in correlations. Specifically, CoCo leverages the strengths of Transformers in sequence processing to effectively capture both global and local contextual features of nodes by aggregating neighbor features at various hops. Subsequently, a correlation analysis module is employed to maximize the correlation between local and global contexts of each normal node. Unseen anomalies are ultimately detected by measuring the discrepancy in the correlation of nodes’ contextual features. Extensive experiments conducted on six datasets with synthetic outliers and five datasets with organic outliers have demonstrated the significant effectiveness of CoCo compared to existing methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"174-187"},"PeriodicalIF":8.9,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAGS: Context-Aware Document Ranking With Contrastive Graph Sampling","authors":"Zhaoheng Huang;Yutao Zhu;Zhicheng Dou;Ji-Rong Wen","doi":"10.1109/TKDE.2024.3491996","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3491996","url":null,"abstract":"In search sessions, a series of interactions in the context has been proven to be advantageous in capturing users’ search intents. Existing studies show that designing pre-training tasks and data augmentation strategies for session search improves the robustness and generalizability of the model. However, such data augmentation strategies only focus on changing the original session structure to learn a better representation. Ignoring information from outside the session, users’ diverse and complex intents cannot be learned well by simply reordering and deleting historical behaviors, proving that such strategies are limited and inadequate. In order to solve the problem of insufficient modeling under complex user intents, we propose exploiting information outside the original session. More specifically, in this paper, we sample queries and documents from the global click-on and follow-up session graph, alter an original session with these samples, and construct a new session that shares a similar user intent with the original one. Specifically, we design four data augmentation strategies based on session graphs in view of both one-hop and multi-hop structures to sample intent-associated query/document nodes. Experiments conducted on three large-scale public datasets demonstrate that our model outperforms the existing ad-hoc and context-aware document ranking models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"89-101"},"PeriodicalIF":8.9,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TGformer: A Graph Transformer Framework for Knowledge Graph Embedding","authors":"Fobo Shi;Duantengchuan Li;Xiaoguang Wang;Bing Li;Xindong Wu","doi":"10.1109/TKDE.2024.3486747","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3486747","url":null,"abstract":"Knowledge graph embedding is efficient method for reasoning over known facts and inferring missing links. Existing methods are mainly triplet-based or graph-based. Triplet-based approaches learn the embedding of missing entities by a single triple only. They ignore the fact that the knowledge graph is essentially a graph structure. Graph-based methods consider graph structure information but ignore the contextual information of nodes in the knowledge graph, making them unable to discern valuable entity (relation) information. In response to the above limitations, we propose a general graph transformer framework for knowledge graph embedding (TGformer). It is the first to use a graph transformer to build knowledge embeddings with triplet-level and graph-level structural features in the static and temporal knowledge graph. Specifically, a context-level subgraph is constructed for each predicted triplet, which models the relation between triplets with the same entity. Afterward, we design a knowledge graph transformer network (KGTN) to fully explore multi-structural features in knowledge graphs, including triplet-level and graph-level, boosting the model to understand entities (relations) in different contexts. Finally, semantic matching is adopted to select the entity with the highest score. Experimental results on several public knowledge graph datasets show that our method can achieve state-of-the-art performance in link prediction.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"526-541"},"PeriodicalIF":8.9,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explainable Session-Based Recommendation via Path Reasoning","authors":"Yang Cao;Shuo Shang;Jun Wang;Wei Zhang","doi":"10.1109/TKDE.2024.3486326","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3486326","url":null,"abstract":"This paper explores explaining session-based recommendation (SR) by path reasoning. Current SR models emphasize accuracy but lack explainability, while traditional path reasoning prioritizes knowledge graph exploration, ignoring sequential patterns present in the session history. Therefore, we propose a generalized hierarchical reinforcement learning framework for SR, which improves the explainability of existing SR models via Path Reasoning, namely PR4SR. Considering the different importance of items to the session, we design the session-level agent to select the items in the session as the starting nodes for path reasoning and the path-level agent to perform path reasoning. In particular, we design a multi-target reward mechanism to adapt to the skip behaviors of sequential patterns in SR and introduce path midpoint reward to enhance the exploration efficiency and accuracy in knowledge graphs. To improve the knowledge graph’s completeness and diversify the paths of explanation, we incorporate extracted feature information from images into the knowledge graph. We instantiate PR4SR in five state-of-the-art SR models (i.e., GRU4REC, NARM, GCSAN, SR-GNN, SASRec) and compare it with other explainable SR frameworks to demonstrate the effectiveness of PR4SR for recommendation and explanation tasks through extensive experiments with these approaches on four datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"278-290"},"PeriodicalIF":8.9,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPM: Multi Patterns Memory Model for Short-Term Time Series Forecasting","authors":"Dezheng Wang;Rongjie Liu;Congyan Chen;Shihua Li","doi":"10.1109/TKDE.2024.3490843","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3490843","url":null,"abstract":"Short-term time series forecasting is pivotal in various scientific and industrial fields. Recent advancements in deep learning-based technologies have significantly improved the efficiency and accuracy of short-term time series modeling. Despite advancements, current time short-term series forecasting methods typically emphasize modeling dependencies across time stamps but frequently overlook inter-variable dependencies, which is crucial for multivariate forecasting. We propose a multi patterns memory model discovering various dependency patterns for short-term multivariate time series forecasting to fill the gap. The proposed model is structured around two key components: the short-term memory block and the long-term memory block. These networks are distinctively characterized by their use of asymmetric convolution, each tailored to process the various spatial-temporal dependencies among data. Experimental results show that the proposed model demonstrates competitive performance over the other time series forecasting methods across five benchmark datasets, likely thanks to the asymmetric structure, which can effectively extract the underlying various spatial-temporal dependencies among data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"438-448"},"PeriodicalIF":8.9,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Ensemble Based on Fuzzy Matrix Self-Enhancement","authors":"Xia Ji;Jiawei Sun;Jianhua Peng;Yue Pang;Peng Zhou","doi":"10.1109/TKDE.2024.3489553","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3489553","url":null,"abstract":"Fuzzy clustering ensemble techniques have been proven to yield more accurate and robust clustering results, with the mainstream methods relying on the fuzzy co-association (FCA) matrix. However, the inherent issues of low-value density and uniform dispersion in the FCA matrix significantly affect the performance of fuzzy clustering ensembles, an aspect that has been overlooked. To address this issue, we propose a novel framework for fuzzy clustering ensemble based on fuzzy matrix self-enhancement (FMSE). Specifically, we initially employ singular value decomposition to extract the principal components of the FCA matrix, thereby alleviating its low-value density. Second, on the basis of the criterion of fuzzy entropy, we measure the fuzziness of samples, design a metric for the fuzzy representativeness of samples, and incorporate it into a fusion-weighted structure for the reconstruction of the FCA matrix, mitigating uniform dispersion. Subsequently, on the basis of the self-enhanced fuzzy matrix model, we utilize a prototype diffusion approach to identify core samples and gradually allocate remaining samples to obtain a consensus clustering solution. Extensive comparative experiments on benchmark datasets against state-of-the-art clustering ensemble methods demonstrate the effectiveness and superiority of the proposed approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"148-161"},"PeriodicalIF":8.9,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}