IEEE Transactions on Knowledge and Data Engineering最新文献

筛选
英文 中文
Early Detection of Malicious Crypto Addresses With Asset Path Tracing and Selection
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-25 DOI: 10.1109/TKDE.2024.3522772
Ling Cheng;Feida Zhu;Qian Shao;Jiashu Pu;Fengzhu Zeng
{"title":"Early Detection of Malicious Crypto Addresses With Asset Path Tracing and Selection","authors":"Ling Cheng;Feida Zhu;Qian Shao;Jiashu Pu;Fengzhu Zeng","doi":"10.1109/TKDE.2024.3522772","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3522772","url":null,"abstract":"In response to the burgeoning cryptocurrency sector and its associated financial risks, there is a growing focus on detecting fraudulent activities and malicious addresses. Traditional studies are limited by their reliance on comprehensive historical data and address-wise manipulation, which are not available for early malice detection and fail to identify addresses controlled by the same fraudulent entity. We thus introduce <italic>Evolve Path Tracer</i>, a novel solution designed for early malice detection in cryptocurrency. This system innovatively incorporates Asset Transfer Paths and corresponding path graphs in an evolve model, which effectively characterize rapidly evolving transaction patterns. First, for the target address, the <italic>Clustering-based Path Selector</i> weight each Asset Transfer Path by finding sibling addresses along the Asset Transfer Paths. <italic>Evolve Path Encoder LSTM</i> and <italic>Evolve Path Graph GCN</i> then encode the asset transfer path and path graph within a dynamic structure. Additionally, our <italic>Hierarchical Survival Predictor</i> efficiently scales to predict the address labels, demonstrating high scalability and efficiency. We rigorously tested <italic>Evolve Path Tracer</i> on three real-world datasets of malicious addresses, where it consistently outperformed existing state-of-the-art methods. Our extensive scalability tests further confirmed the model's robust adaptability in dynamic prediction environments, highlighting its potential as a significant tool in the realm of cryptocurrency security.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1154-1166"},"PeriodicalIF":8.9,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HidAttack: An Effective and Undetectable Model Poisoning Attack to Federated Recommenders
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-25 DOI: 10.1109/TKDE.2024.3522763
Waqar Ali;Khalid Umer;Xiangmin Zhou;Jie Shao
{"title":"HidAttack: An Effective and Undetectable Model Poisoning Attack to Federated Recommenders","authors":"Waqar Ali;Khalid Umer;Xiangmin Zhou;Jie Shao","doi":"10.1109/TKDE.2024.3522763","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3522763","url":null,"abstract":"Privacy concerns in recommender systems are potentially addressed due to constitutional and commercial requirements. Centralized recommendation models are susceptible to poisoning attacks, which threaten their integrity. In this context, federated learning has emerged as an optimal solution to privacy concerns. However, recent investigations proved that Federated Recommender Systems (FedRS) are also vulnerable to model poisoning attacks. Existing attack possibilities highlighted in academic literature require a large fraction of Byzantine clients to effectively influence the training process, which is unrealistic for practical systems with millions of users. Additionally, most attack models neglected the role of the defense mechanism running at the aggregation server. To this end, we propose a novel undetectable hidden attack strategy (HidAttack) for FedRS, aiming to raise the exposure ratio of targeted items with minimum Byzantine clients. To achieve this goal, we construct a cluster of baseline attacks, on top of which a bandit model is designed that intelligently infers effective poisoned gradients. It ensures a diverse pattern of poisoned gradients and therefore, Byzantine clients cannot be distinguished from benign clients by the defense mechanism. Extensive experiments demonstrate that: 1) our attack model significantly increases the target item's exposure rate covertly without compromising the recommendation accuracy and 2) the current defenses are insufficient, emphasizing the need for better security improvements against our model poisoning attack to FedRS.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1227-1240"},"PeriodicalIF":8.9,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LDGI: Location-Discriminative Geo-Indistinguishability for Location Privacy
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-25 DOI: 10.1109/TKDE.2024.3522320
Youwen Zhu;Yuanyuan Hong;Qiao Xue;Xiao Lan;Yushu Zhang;Yong Xiang
{"title":"LDGI: Location-Discriminative Geo-Indistinguishability for Location Privacy","authors":"Youwen Zhu;Yuanyuan Hong;Qiao Xue;Xiao Lan;Yushu Zhang;Yong Xiang","doi":"10.1109/TKDE.2024.3522320","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3522320","url":null,"abstract":"Geo-Indistinguishability (GI) is a powerful privacy model that can effectively protect location information by limiting the ability of an attacker to infer a user's true location. In real life, locations usually have different sensitive levels in terms of privacy; for example, shopping malls might be low-sensitive while home addresses might be high-sensitive for users. But the GI model does not consider the various sensitive levels of locations, and implements the same perturbation on all locations to meet the highest privacy requirement. This would cause overprotection of low-sensitive locations and reduce data utility. To strike a good balance between privacy and utility, in this paper, we propose a novel privacy notion, termed <underline>L</u>ocation-<underline>D</u>iscriminative <underline>G</u>eo-<underline>I</u>ndistinguishability (LDGI), which takes into account different sensitive levels of location privacy. With LDGI model, we then develop a perturbation scheme called EM-LDGI based on the exponential mechanism, and an advance scheme MinQL to further enhance data utility. To improve the efficiency of the proposed schemes, we design a scheme MinQL-S with the assistance of the spanner graph, at the cost of a slight utility degradation. We theoretically analyze that the proposed schemes satisfy LDGI and evaluate their performance by extensive experiments on both synthetic and real datasets. The comparison with GI mechanisms demonstrates the advantages of the LDGI model.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1282-1293"},"PeriodicalIF":8.9,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label-Aware Causal Feature Selection
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-25 DOI: 10.1109/TKDE.2024.3522580
Zhaolong Ling;Jingxuan Wu;Yiwen Zhang;Peng Zhou;Xingyu Wu;Kui Yu;Xindong Wu
{"title":"Label-Aware Causal Feature Selection","authors":"Zhaolong Ling;Jingxuan Wu;Yiwen Zhang;Peng Zhou;Xingyu Wu;Kui Yu;Xindong Wu","doi":"10.1109/TKDE.2024.3522580","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3522580","url":null,"abstract":"Causal feature selection has recently received increasing attention in machine learning and data mining, especially in the era of Big Data. Existing causal feature selection algorithms select unique causal features of the single class label as the optimal feature subset. However, a single class label usually has multiple classes, and it is unreasonable to select the same causal features for different classes of a single class label. To address this problem, we employ the class-specific mutual information to evaluate the causal information carried by each class of the single class label, and theoretically analyze the unique relationship between each class and the causal features. Based on this, a <underline>L</u>abel-<underline>a</u>ware <underline>C</u>ausal <underline>F</u>eature <underline>S</u>election algorithm (LaCFS) is proposed to identifies the causal features for each class of the class label. Specifically, LaCFS uses the pairwise comparisons of class-specific mutual information and the size of class-specific mutual information values from the perspective of each class, and follows a divide-and-conquer framework to find causal features. The correctness and application condition of LaCFS are theoretically proved, and extensive experiments are conducted to demonstrate the efficiency and superiority of LaCFS compared to the state-of-the-art approaches.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1268-1281"},"PeriodicalIF":8.9,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Extensive Survey With Empirical Studies on Deep Temporal Point Process
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-24 DOI: 10.1109/TKDE.2024.3522114
Haitao Lin;Cheng Tan;Lirong Wu;Zicheng Liu;Zhangyang Gao;Stan Z. Li
{"title":"An Extensive Survey With Empirical Studies on Deep Temporal Point Process","authors":"Haitao Lin;Cheng Tan;Lirong Wu;Zicheng Liu;Zhangyang Gao;Stan Z. Li","doi":"10.1109/TKDE.2024.3522114","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3522114","url":null,"abstract":"Temporal point process as the stochastic process on a continuous domain of time is commonly used to model the asynchronous event sequence featuring occurrence timestamps. Thanks to the strong expressivity of deep neural networks, they are emerging as a promising choice for capturing the patterns in asynchronous sequences, in the context of temporal point process. In this paper, we first review recent research emphasis and difficulties in modeling asynchronous event sequences with deep temporal point process, which can be concluded into four fields: encoding of history sequence, formulation of conditional intensity function, relational discovery of events, and learning approaches for optimization. We introduce most of the recently proposed models by dismantling them into four parts and conduct experiments by re-modularizing the first three parts with the same learning strategy for a fair empirical evaluation. Besides, we extend the history encoders and conditional intensity function family and propose a Granger causality discovery framework for exploiting the relations among multi-types of events. Because the Granger causality can be represented by the Granger causality graph, discrete graph structure learning in the framework of Variational Inference is employed to reveal latent structures of the graph. Further experiments show that the proposed framework with latent graph discovery can both capture the relations and achieve an improved fitting and predicting performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1599-1619"},"PeriodicalIF":8.9,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RMD-Graph: Adversarial Attacks Resisting Malicious Domain Detection Based on Dual Denoising
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-20 DOI: 10.1109/TKDE.2024.3520798
Sanfeng Zhang;Luyao Huang;Zheng Zhang;Wenduan Xu;Wang Yang;Linfeng Liu
{"title":"RMD-Graph: Adversarial Attacks Resisting Malicious Domain Detection Based on Dual Denoising","authors":"Sanfeng Zhang;Luyao Huang;Zheng Zhang;Wenduan Xu;Wang Yang;Linfeng Liu","doi":"10.1109/TKDE.2024.3520798","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3520798","url":null,"abstract":"The Domain Name System (DNS) is a critical Internet service that translates domain names into IPs, but it is often targeted by attackers, posing a serious security risk. Graph-based models for detecting malicious domains have shown high performance but are vulnerable to adversarial attacks. To address this issue, we propose RMD-Graph, which is characterized by its ability to resist adversarial attacks and its low dependency on labeled data. A dual denoising module is specifically designed based on two autoencoders to generate the reconstructed graph, where SVD, TOP-k and reconstruction loss are introduced to enhance the denoising capability of autoencoders. Subsequently, residual connections are employed to generate an optimized graph that retains essential information from the original graph. The reconstructed graph and the optimized graph are then utilized as two views for graph contrastive learning, thereby achieving an self-supervised representation learning task without labels. In the downstream malicious domain detection, the denoised node representations are employed for machine learning classification. Extensive experiments are conducted on publicly available DNS datasets, and the results demonstrate that RMD-Graph significantly outperforms known baseline methods, especially in adversarial scenarios.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1394-1410"},"PeriodicalIF":8.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colorful Star Motif Counting: Concepts, Algorithms and Applications
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-13 DOI: 10.1109/TKDE.2024.3514997
Hongchao Qin;Gao Sen;Rong-Hua Li;Hongzhi Chen;Ye Yuan;Guoren Wang
{"title":"Colorful Star Motif Counting: Concepts, Algorithms and Applications","authors":"Hongchao Qin;Gao Sen;Rong-Hua Li;Hongzhi Chen;Ye Yuan;Guoren Wang","doi":"10.1109/TKDE.2024.3514997","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3514997","url":null,"abstract":"A colorful star motif is a star-shaped graph where any two nodes have different colors. Counting the colorful star motif can help to analyze the structural properties of real-life colorful graphs, model higher-order clustering, and accelerate the mining of the densest subgraph exhibiting <inline-formula><tex-math>$h$</tex-math></inline-formula>-clique characteristics in graphs. In this manuscript, we introduce the concept of colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-star in a colored graph and proposes two higher-order cohesive subgraph models, namely colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-star core and colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-star truss. We show that the colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-stars can be counted and updated very efficiently using a novel dynamic programming (DP) algorithm. Based on the proposed DP algorithm, we develop a colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-star core decomposition algorithm which takes <inline-formula><tex-math>$O(h m)$</tex-math></inline-formula> time, <inline-formula><tex-math>$O(h n+m)$</tex-math></inline-formula> space; and a colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-star truss decomposition algorithm which takes <inline-formula><tex-math>$O(h m^{1.5})$</tex-math></inline-formula> time, <inline-formula><tex-math>$O(hm)$</tex-math></inline-formula> space, where <inline-formula><tex-math>$m$</tex-math></inline-formula> and <inline-formula><tex-math>$n$</tex-math></inline-formula> denote the number of edges and nodes of the graph respectively. Moreover, we also propose a graph reduction technique based on our colorful <inline-formula><tex-math>$h$</tex-math></inline-formula>-star core model to accelerate the computation of the approximation algorithm for <inline-formula><tex-math>$ h$</tex-math></inline-formula>-clique densest subgraph mining. The results of comprehensive experiments on 11 large real-world datasets demonstrate the efficiency, scalability and effectiveness of the proposed algorithms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1105-1125"},"PeriodicalIF":8.9,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-12 DOI: 10.1109/TKDE.2024.3516192
Zhe Zhao;Pengkun Wang;Xu Wang;Haibin Wen;Xiaolong Xie;Zhengyang Zhou;Qingfu Zhang;Yang Wang
{"title":"Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks","authors":"Zhe Zhao;Pengkun Wang;Xu Wang;Haibin Wen;Xiaolong Xie;Zhengyang Zhou;Qingfu Zhang;Yang Wang","doi":"10.1109/TKDE.2024.3516192","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3516192","url":null,"abstract":"Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel <underline>D</u>elayed <underline>B</u>ottlenecking <underline>P</u>re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1140-1153"},"PeriodicalIF":8.9,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unconstrained Fuzzy C-Means Based on Entropy Regularization: An Equivalent Model 基于熵正则化的无约束模糊c均值:一个等效模型
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-12 DOI: 10.1109/TKDE.2024.3516085
Feiping Nie;Runxin Zhang;Yu Duan;Rong Wang
{"title":"Unconstrained Fuzzy C-Means Based on Entropy Regularization: An Equivalent Model","authors":"Feiping Nie;Runxin Zhang;Yu Duan;Rong Wang","doi":"10.1109/TKDE.2024.3516085","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3516085","url":null,"abstract":"Fuzzy c-means based on entropy regularization (FCER) is a commonly used machine learning algorithm that uses maximum entropy as the regularization term to realize fuzzy clustering. However, this model has many constraints and is challenging to optimize directly. During the solution process, the membership matrix and cluster centers are alternately optimized, easily converging to poor local solutions, limiting the clustering performance. In this paper, we start with the optimization model and propose an unconstrained fuzzy clustering model (UFCER) equivalent to FCER, which reduces the size of optimization variables from \u0000<inline-formula><tex-math>$(n+d)times c$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$dtimes c$</tex-math></inline-formula>\u0000. More importantly, there is no need to calculate the membership matrix during the optimization process iteratively. The time complexity is only linear, and the convergence speed is fast. We conduct extensive experiments on real datasets. The comparison of objective function value and clustering performance fully demonstrates that under the same initialization, our proposed algorithm can converge to smaller local minimums and get better clustering performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"979-990"},"PeriodicalIF":8.9,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heavy Nodes in a Small Neighborhood: Exact and Peeling Algorithms With Applications
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-11 DOI: 10.1109/TKDE.2024.3515875
Ling Li;Hilde Verbeek;Huiping Chen;Grigorios Loukides;Robert Gwadera;Leen Stougie;Solon P. Pissis
{"title":"Heavy Nodes in a Small Neighborhood: Exact and Peeling Algorithms With Applications","authors":"Ling Li;Hilde Verbeek;Huiping Chen;Grigorios Loukides;Robert Gwadera;Leen Stougie;Solon P. Pissis","doi":"10.1109/TKDE.2024.3515875","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3515875","url":null,"abstract":"We introduce a weighted and unconstrained variant of the well-known minimum <inline-formula><tex-math>$k$</tex-math></inline-formula> union problem: Given a bipartite graph <inline-formula><tex-math>$mathcal {G}(U,V,E)$</tex-math></inline-formula> with weights for all nodes in <inline-formula><tex-math>$V$</tex-math></inline-formula>, find a set <inline-formula><tex-math>$Ssubseteq V$</tex-math></inline-formula> such that the ratio between the total weight of the nodes in <inline-formula><tex-math>$S$</tex-math></inline-formula> and the number of their <i>distinct</i> adjacent nodes in <inline-formula><tex-math>$U$</tex-math></inline-formula> is maximized. Our problem, which we term <i>Heavy Nodes in a Small Neighborhood</i> (<small>HNSN</small>), finds applications in marketing, team formation, and money laundering detection. For example, in the latter application, <inline-formula><tex-math>$S$</tex-math></inline-formula> represents bank account holders who obtain illicit money from some peers of a criminal and route it through their accounts to a target account belonging to the criminal. We prove that <small>HNSN</small> can be solved exactly in polynomial time via linear programming. We also develop several algorithms offering different effectiveness/efficiency trade-offs: an exact algorithm, based on node contraction, graph decomposition, and linear programming, as well as three peeling algorithms. The first peeling algorithm is a near-linear time approximation algorithm with a tight approximation ratio, the second is an iterative algorithm that converges to an optimal solution in a very small number of iterations in practice, and the third is a near-linear time greedy heuristic. In addition, we formalize a money laundering scenario involving multiple target accounts and show how our algorithms can be extended to deal with it. Our experiments on real and synthetic datasets show that our algorithms find (near-)optimal solutions, outperforming a natural baseline, and that they can detect money laundering more effectively and efficiently than two state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1853-1870"},"PeriodicalIF":8.9,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信