Proceedings of the ACM on Management of Data最新文献_第4页

Efficient Algorithm for Budgeted Adaptive Influence Maximization: An Incremental RR-set Update Approach 预算自适应影响最大化的有效算法:一种增量rr集更新方法

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617328

Qintian Guo, Chen Feng, Fangyuan Zhang, Sibo Wang

{"title":"Efficient Algorithm for Budgeted Adaptive Influence Maximization: An Incremental RR-set Update Approach","authors":"Qintian Guo, Chen Feng, Fangyuan Zhang, Sibo Wang","doi":"10.1145/3617328","DOIUrl":"https://doi.org/10.1145/3617328","url":null,"abstract":"Given a graph G, a cost associated with each node, and a budget B, the budgeted influence maximization (BIM) aims to find the optimal set S of seed nodes that maximizes the influence among all possible sets such that the total cost of nodes in S is no larger than B. Existing solutions mainly follow the non-adaptive idea, i.e., determining all the seeds before observing any actual diffusion. Due to the absence of actual diffusion information, they may result in unsatisfactory influence spread. Motivated by the limitation of existing solutions, in this paper, we make the first attempt to solve the BIM problem under the adaptive setting, where seed nodes are iteratively selected after observing the diffusion result of the previous seeds. We design the first practical algorithm which achieves an expected approximation guarantee by probabilistically adopting a cost-aware greedy idea or a single influential node. Further, we develop an optimized version to improve its practical performance in terms of influence spread. Besides, the scalability issues of the adaptive IM-related problems still remain open. It is because they usually involve multiple rounds (e.g., equal to the number of seeds) and in each round, they have to construct sufficient new reverse-reachable set (RR-set) samples such that the claimed approximation guarantee can actually hold. However, this incurs prohibitive computation, imposing limitations on real applications. To solve this dilemma, we propose an incremental update approach. Specifically, it maintains extra construction information when building RR-sets, and then it can quickly correct a problematic RR-set from the very step where it is first affected. As a result, we recycle the RR-sets at a small computational cost, while still providing correctness guarantee. Finally, extensive experiments on large-scale real graphs demonstrate the superiority of our algorithms over baselines in terms of both influence spread and running time.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"35 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secure Sampling for Approximate Multi-party Query Processing 近似多方查询处理的安全抽样

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617339

Qiyao Luo, Yilei Wang, Ke Yi, Sheng Wang, Feifei Li

引用次数: 0

Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation 基于模块化的超图聚类:随机超图模型、超边缘聚类关系和计算

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617335

Zijin Feng, Miao Qiao, Hong Cheng

{"title":"Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation","authors":"Zijin Feng, Miao Qiao, Hong Cheng","doi":"10.1145/3617335","DOIUrl":"https://doi.org/10.1145/3617335","url":null,"abstract":"A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Origin-Destination Travel Time Oracle for Map-based Services 基于地图服务的出发地旅行时间Oracle

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617337

Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen

{"title":"Origin-Destination Travel Time Oracle for Map-based Services","authors":"Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen","doi":"10.1145/3617337","DOIUrl":"https://doi.org/10.1145/3617337","url":null,"abstract":"Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"33 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach 快速最大拟团枚举:一种剪枝和分支协同设计方法

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617331

Kaiqiang Yu, Cheng Long

{"title":"Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach","authors":"Kaiqiang Yu, Cheng Long","doi":"10.1145/3617331","DOIUrl":"https://doi.org/10.1145/3617331","url":null,"abstract":"Mining cohesive subgraphs from a graph is a fundamental problem in graph data analysis. One notable cohesive structure is γ-quasi-clique (QC), where each vertex connects at least a fraction γ of the other vertices inside. Enumerating maximal γ-quasi-cliques (MQCs) of a graph has been widely studied and used for many applications such as community detection and significant biomolecule structure discovery. One common practice of finding all MQCs is to (1) find a set of QCs containing all MQCs and then (2) filter out non-maximal QCs. While quite a few algorithms have been developed (which are branch-and-bound algorithms) for finding a set of QCs that contains all MQCs, all focus on sharpening the pruning techniques and devote little effort to improving the branching part. As a result, they provide no guarantee on pruning branches and all have the worst-case time complexity of O*(2n), where O* suppresses the polynomials and n is the number of vertices in the graph. In this paper, we focus on the problem of finding a set of QCs containing all MQCs but deviate from further sharpening the pruning techniques as existing methods do. We pay attention to both the pruning and branching parts and develop new pruning techniques and branching methods that would suit each other better towards pruning more branches both theoretically and practically. Specifically, we develop a new branch-and-bound algorithm called FastQC based on newly developed pruning techniques and branching methods, which improves the worst-case time complexity to O*(αkn), where αk is a positive real number strictly smaller than 2. Furthermore, we develop a divide-and-conquer strategy for boosting the performance of FastQC. Finally, we conduct extensive experiments on both real and synthetic datasets, and the results show that our algorithms are up to two orders of magnitude faster than the state-of-the-art on real datasets.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0