{"title":"Efficient Algorithms for Minimizing the Kirchhoff Index via Adding Edges","authors":"Xiaotian Zhou;Ahad N. Zehmakan;Zhongzhi Zhang","doi":"10.1109/TKDE.2025.3552644","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3552644","url":null,"abstract":"The Kirchhoff index, which is the sum of the resistance distance between every pair of nodes in a network, is a key metric for gauging network performance, where lower values signify enhanced performance. In this paper, we study the problem of minimizing the Kirchhoff index by adding edges. We first provide a greedy algorithm for solving this problem and give an analysis of its quality based on the bounds of the submodularity ratio and the curvature. Then, we introduce a gradient-based greedy algorithm as a new paradigm to solve this problem. To accelerate the computation cost, we leverage geometric properties, convex hull approximation, and approximation of the projected coordinate of each point. To further improve this algorithm, we use pre-pruning and fast update techniques, making it particularly suitable for large networks. Our proposed algorithms have nearly-linear time complexity. We provide extensive experiments on ten real networks to evaluate the quality of our algorithms. The results demonstrate that our proposed algorithms outperform the state-of-the-art methods in terms of efficiency and effectiveness. Moreover, our algorithms are scalable to large graphs with over 5 million nodes and 12 million edges.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3342-3355"},"PeriodicalIF":8.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LOFTune: A Low-Overhead and Flexible Approach for Spark SQL Configuration Tuning","authors":"Jiahui Li;Junhao Ye;Yuren Mao;Yunjun Gao;Lu Chen","doi":"10.1109/TKDE.2025.3549232","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3549232","url":null,"abstract":"The query efficiency of Spark SQL is significantly impacted by its configurations. Therefore, configuration tuning has drawn great attention, and various automatic configuration tuning methods have been proposed. However, existing methods suffer from two issues: (1) high tuning overhead: they need to repeatedly execute the workloads several times to obtain the training samples, which is time-consuming; and (2) low throughput: they need to occupy resources like CPU cores and memory for a long time, causing other Spark SQL workloads to wait, thereby reducing the overall system throughput. These issues impede the use of automatic configuration tuning methods in practical systems which have limited tuning budget and many concurrent workloads. To address these issues, this paper proposes a <bold>L</b>ow-<bold>O</b>verhead and <bold>F</b>lexible approach for Spark SQL configuration <bold>Tuning</b>, dubbed <bold>LOFTune</b>. LOFTune reduces the tuning overhead via a sample-efficient optimization framework, which is proposed based on multi-task SQL representation learning and multi-armed bandit. Furthermore, LOFTune solves the low throughput issue with a recommendation-sampling-decoupled tuning framework. Extensive experiments validate the effectiveness of LOFTune. In the sampling-allowed case, LOFTune can save up to 90% of the workload runs comparing with the state-of-the-art methods. Besides, in the zero-sampling case, LOFTune can reduce up to 41.26% of latency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3528-3542"},"PeriodicalIF":8.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zkfhed: A Verifiable and Scalable Blockchain-Enhanced Federated Learning System","authors":"Bingxue Zhang;Guangguang Lu;Yuncheng Wu;Kunpeng Ren;Feida Zhu","doi":"10.1109/TKDE.2025.3550546","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3550546","url":null,"abstract":"Federated learning (FL) is an emerging paradigm that enables multiple clients to collaboratively train a machine learning (ML) model without the need to exchange their raw data. However, it relies on a centralized authority to coordinate participants’ activities. This not only interrupts the entire training task in case of a single point of failure, but also lacks an effective regulatory mechanism to prevent malicious behavior. Although blockchain, with its decentralized architecture and data immutability, has significantly advanced the development of FL, it still struggles to withstand poisoning attacks and faces limitations in computational scalability. We propose Zkfhed, a verifiable and scalable FL system that overcomes the limitations of blockchain-based FL in poison attacks and computational scalability. First, we propose a two-stage audit scheme based on zero-knowledge proofs (ZKPs), which verifies that the training data are extracted from trusted organizations and that computations on the data exactly follow the specified training protocols. Second, we propose a homomorphic encryption delegation learning (HEDL), based on fully homomorphic encryption (FHE). It is capable of outsourcing complex computing to external computing resources without sacrificing the client's data privacy. Final, extensive experiments on real-world datasets demonstrate that Zkfhed can effectively identify malicious clients and is highly efficient and scalable in terms of online time and communication efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3841-3854"},"PeriodicalIF":8.9,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143902652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiscale Weisfeiler-Leman Directed Graph Neural Networks for Prerequisite-Link Prediction","authors":"Yupei Zhang;Xiran Qu;Shuhui Liu;Yan Pang;Xuequn Shang","doi":"10.1109/TKDE.2025.3552045","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3552045","url":null,"abstract":"Prerequisite-link Prediction (PLP) aims to discover the condition relations of a specific event or a concerned variable, which is a fundamental problem in a large number of fields, such as educational data mining. Current studies on PLP usually developed graph neural networks (GNNs) to learn the representations of pairs of nodes. However, these models fail to distinguish non-isomorphic graphs and integrate multiscale structures, leading to the insufficient expressive capability of GNNs. To this end, we in this paper proposed <italic>k</i>-dimensional Weisferiler-Leman directed GNNs, dubbed <italic>k</i>-WediGNNs, to recognize non-isomorphic graphs via the Weisferiler-Leman algorithm. Furthermore, we integrated the multiscale structures of a directed graph into <italic>k</i>-WediGNNs, dubbed multiscale <italic>k</i>-WediGNNs, from the bidirected views of in-degree and out-degree. With the Siamese network, the proposed models are extended to address the problem of PLP. Besides, the expressive power is then interpreted via theoretical proofs. The experiments were conducted on four publicly available datasets for concept prerequisite relation prediction (CPRP). The results show that the proposed models achieve better performance than the state-of-the-art approaches, where our multiscale <italic>k</i>-WediGNN achieves a new benchmark in the task of CPRP.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3556-3569"},"PeriodicalIF":8.9,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Final: Combining First-Order Logic With Natural Logic for Question Answering","authors":"Jihao Shi;Xiao Ding;Siu Cheung Hui;Yuxiong Yan;Hengwei Zhao;Ting Liu;Bing Qin","doi":"10.1109/TKDE.2025.3551231","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551231","url":null,"abstract":"Many question-answering problems can be approached as textual entailment tasks, where the hypotheses are formed by the question and candidate answers, and the premises are derived from an external knowledge base. However, current neural methods often lack transparency in their decision-making processes. Moreover, first-order logic methods, while systematic, struggle to integrate unstructured external knowledge. To address these limitations, we propose a neuro-symbolic reasoning framework called <italic><small>Final</small></i>, which combines <underline><b>FI</b></u>rst-order logic with <underline><b>NA</b></u>tural <underline><b>L</b></u>ogic for question answering. Our framework utilizes <italic>first-order logic</i> to systematically decompose hypotheses and <italic>natural logic</i> to construct reasoning paths from premises to hypotheses, employing bidirectional reasoning to establish links along the reasoning path. This approach not only enhances interpretability but also effectively integrates unstructured knowledge. Our experiments on three benchmark datasets, namely QASC, WorldTree, and WikiHop, demonstrate that <sc>Final</small> outperforms existing methods in commonsense reasoning and reading comprehension tasks, achieving state-of-the-art results. Additionally, our framework also provides transparent reasoning paths that elucidate the rationale behind the correct decisions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3103-3117"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Point-of-Interest Recommendation: Models, Architectures, and Security","authors":"Qianru Zhang;Peng Yang;Junliang Yu;Haixin Wang;Xingwei He;Siu-Ming Yiu;Hongzhi Yin","doi":"10.1109/TKDE.2025.3551292","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551292","url":null,"abstract":"The widespread adoption of smartphones and Location-Based Social Networks has led to a massive influx of spatio-temporal data, creating unparalleled opportunities for enhancing Point-of-Interest (POI) recommendation systems. These advanced POI systems are crucial for enriching user experiences, enabling personalized interactions, and optimizing decision-making processes in the digital landscape. However, existing surveys tend to focus on traditional approaches and few of them delve into cutting-edge developments, emerging architectures, as well as security considerations in POI recommendations. To address this gap, our survey stands out by offering a comprehensive, up-to-date review of POI recommendation systems, covering advancements in models, architectures, and security aspects. We systematically examine the transition from traditional models to advanced techniques such as large language models. Additionally, we explore the architectural evolution from centralized to decentralized and federated learning systems, highlighting the improvements in scalability and privacy. Furthermore, we address the increasing importance of security, examining potential vulnerabilities and privacy-preserving approaches. Our taxonomy provides a structured overview of the current state of POI recommendation, while we also identify promising directions for future research in this rapidly advancing field.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3153-3172"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xihong Yang;Yiqi Wang;Jin Chen;Wenqi Fan;Xiangyu Zhao;En Zhu;Xinwang Liu;Defu Lian
{"title":"Dual Test-Time Training for Out-of-Distribution Recommender System","authors":"Xihong Yang;Yiqi Wang;Jin Chen;Wenqi Fan;Xiangyu Zhao;En Zhu;Xinwang Liu;Defu Lian","doi":"10.1109/TKDE.2025.3548160","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3548160","url":null,"abstract":"Deep learning has been widely applied in recommender systems, which has recently achieved revolutionary progress. However, most existing learning-based methods assume that the user and item distributions remain unchanged between the training phase and the test phase. However, the distribution of user and item features can naturally shift in real-world scenarios, potentially resulting in a substantial decrease in recommendation performance. This phenomenon can be formulated as an Out-Of-Distribution (OOD) recommendation problem. To address this challenge, we propose a novel <bold>D</b>ual <bold>T</b>est-<bold>T</b>ime-<bold>T</b>raining framework for <bold>O</b>OD <bold>R</b>ecommendation, termed <bold>DT3OR</b>. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model, allowing the model to adapt specially to the shifting user and item features. To be specific, we propose a self-distillation task and a contrastive task to assist the model learning both the user’s invariant interest preferences and the variant user/item characteristics during the test-time phase, thus facilitating a smooth adaptation to the shifting features. Furthermore, we provide theoretical analysis to support the rationale behind our dual test-time training framework. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy. We conduct experiments on five datasets with various backbones. Comprehensive experimental results have demonstrated the effectiveness of DT3OR compared to other state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3312-3326"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pricing for Data Assets Based on Data Quality, Quantity and Utility on the Perspective of Consumer Heterogeneity","authors":"Juanjuan Lin;Zhigang Huang;Yong Tang","doi":"10.1109/TKDE.2025.3551401","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551401","url":null,"abstract":"It is an inevitable trend for the development of global digital economy to transform data into data assets and realize their transaction circulation. Aiming at the release of data value and the development of its transaction process, the concept of integrated score of data is proposed by combining integrated quality index containing four dimensions with data quantity. On this basis, data assets are priced according to the principle of profit maximization by constructing a nonlinear programming model. Among them, three types of pricing models are divided according to the heterogeneity of consumers’ utility sensitivity, and the consumers’ wiilingness to pay are adjusted based on business parameters using FAHP system. The proposed model is verified with the data of China's carbon emissions as the original data, combined with the KNN machine learning algorithm and a series of simulation analyses. In addition, multiple sets of heterogeneous data are tested. The results show that the quality, quantity and utility of data have an important impact on the pricing of data assets, and it is necessary to divide the utility sensitivity of consumers as well as take business parameters into consideration. The model proposed can also provide decision-making reference for data platforms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3641-3652"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zuohan Wu;Chen Jason Zhang;Han Yin;Rui Meng;Libin Zheng;Huaijie Zhu;Wei Liu
{"title":"DRLPG: Reinforced Opponent-Aware Order Pricing for Hub Mobility Services","authors":"Zuohan Wu;Chen Jason Zhang;Han Yin;Rui Meng;Libin Zheng;Huaijie Zhu;Wei Liu","doi":"10.1109/TKDE.2025.3551147","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551147","url":null,"abstract":"A modern service model known as the “hub-oriented” model has emerged with the development of mobility services. This model allows users to request vehicles from multiple companies (agents) simultaneously through a unified entry (a ‘hub’). In contrast to conventional services, the “hub-oriented” model emphasizes pricing competition. To address this scenario, an agent should consider its competitors when developing its pricing strategy. In this paper, we introduce DRLPG, a mixed opponent-aware pricing method, which consists of two main components: the two-stage guarantor and the end-to-end deep reinforcement learning (DRL) module, as well as interaction mechanisms. In the guarantor, we design a prediction-decision framework. Specifically, we propose a new objective function for the spatiotemporal neural network in the prediction stage and utilize a traditional reinforcement learning method in the decision stage, respectively. In the end-to-end DRL framework, we explore the adoption of conventional DRL in the “hub-oriented” scenario. Finally, a meta-decider and an experience-sharing mechanism are proposed to combine both methods and leverage their advantages. We conduct extensive experiments on real data, and DRLPG achieves an average improvement of 99.9% and 61.1% in the peak and low peak periods, respectively. Our results demonstrate the effectiveness of our approach compared to the baseline.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3298-3311"},"PeriodicalIF":8.9,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hard or False: Keep the Balance for Negative Sampling in Knowledge Graphs","authors":"Feihu Che;Jianhua Tao;Qionghai Dai","doi":"10.1109/TKDE.2025.3550545","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3550545","url":null,"abstract":"Negative sampling is an essential part in knowledge graph embedding, which offers significant advantages to numerous downstream related tasks. There are two kinds of important negatives: hard and false negatives. Hard negatives are the negatives which are difficult to distinguish from positive samples, while false negatives are positive samples which are mistakenly identified as negatives. Harnessing hard negatives effectively can make the model more discriminative, and reducing false negatives can avoid misleading the model during training. Therefore, the two kinds of negatives are essential in high-quality negative sampling. However, the present negative sampling methods face two shortcomings: 1.judging one negative is hard or false mainly relies on score functions; 2. difficulty in balancing the impact of hard and false negatives. In this paper, we absorb bigram language model and propose a novel criterion to help verify the negatives are hard or false, and discuss how to keep the balance between hard and false negatives. Experiments on four representative score functions and two public datasets demonstrate the effects of the proposed negative sampling method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3445-3456"},"PeriodicalIF":8.9,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}