Ling Li;Hilde Verbeek;Huiping Chen;Grigorios Loukides;Robert Gwadera;Leen Stougie;Solon P. Pissis
{"title":"Heavy Nodes in a Small Neighborhood: Exact and Peeling Algorithms With Applications","authors":"Ling Li;Hilde Verbeek;Huiping Chen;Grigorios Loukides;Robert Gwadera;Leen Stougie;Solon P. Pissis","doi":"10.1109/TKDE.2024.3515875","DOIUrl":null,"url":null,"abstract":"We introduce a weighted and unconstrained variant of the well-known minimum <inline-formula><tex-math>$k$</tex-math></inline-formula> union problem: Given a bipartite graph <inline-formula><tex-math>$\\mathcal {G}(U,V,E)$</tex-math></inline-formula> with weights for all nodes in <inline-formula><tex-math>$V$</tex-math></inline-formula>, find a set <inline-formula><tex-math>$S\\subseteq V$</tex-math></inline-formula> such that the ratio between the total weight of the nodes in <inline-formula><tex-math>$S$</tex-math></inline-formula> and the number of their <i>distinct</i> adjacent nodes in <inline-formula><tex-math>$U$</tex-math></inline-formula> is maximized. Our problem, which we term <i>Heavy Nodes in a Small Neighborhood</i> (<small>HNSN</small>), finds applications in marketing, team formation, and money laundering detection. For example, in the latter application, <inline-formula><tex-math>$S$</tex-math></inline-formula> represents bank account holders who obtain illicit money from some peers of a criminal and route it through their accounts to a target account belonging to the criminal. We prove that <small>HNSN</small> can be solved exactly in polynomial time via linear programming. We also develop several algorithms offering different effectiveness/efficiency trade-offs: an exact algorithm, based on node contraction, graph decomposition, and linear programming, as well as three peeling algorithms. The first peeling algorithm is a near-linear time approximation algorithm with a tight approximation ratio, the second is an iterative algorithm that converges to an optimal solution in a very small number of iterations in practice, and the third is a near-linear time greedy heuristic. In addition, we formalize a money laundering scenario involving multiple target accounts and show how our algorithms can be extended to deal with it. Our experiments on real and synthetic datasets show that our algorithms find (near-)optimal solutions, outperforming a natural baseline, and that they can detect money laundering more effectively and efficiently than two state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1853-1870"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10792980/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce a weighted and unconstrained variant of the well-known minimum $k$ union problem: Given a bipartite graph $\mathcal {G}(U,V,E)$ with weights for all nodes in $V$, find a set $S\subseteq V$ such that the ratio between the total weight of the nodes in $S$ and the number of their distinct adjacent nodes in $U$ is maximized. Our problem, which we term Heavy Nodes in a Small Neighborhood (HNSN), finds applications in marketing, team formation, and money laundering detection. For example, in the latter application, $S$ represents bank account holders who obtain illicit money from some peers of a criminal and route it through their accounts to a target account belonging to the criminal. We prove that HNSN can be solved exactly in polynomial time via linear programming. We also develop several algorithms offering different effectiveness/efficiency trade-offs: an exact algorithm, based on node contraction, graph decomposition, and linear programming, as well as three peeling algorithms. The first peeling algorithm is a near-linear time approximation algorithm with a tight approximation ratio, the second is an iterative algorithm that converges to an optimal solution in a very small number of iterations in practice, and the third is a near-linear time greedy heuristic. In addition, we formalize a money laundering scenario involving multiple target accounts and show how our algorithms can be extended to deal with it. Our experiments on real and synthetic datasets show that our algorithms find (near-)optimal solutions, outperforming a natural baseline, and that they can detect money laundering more effectively and efficiently than two state-of-the-art methods.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.