Zhibin Gu, Songhe Feng, Zhendong Li, Jiazheng Yuan, Jun Liu
{"title":"NOODLE: Joint Cross-View Discrepancy Discovery and High-Order Correlation Detection for Multi-View Subspace Clustering","authors":"Zhibin Gu, Songhe Feng, Zhendong Li, Jiazheng Yuan, Jun Liu","doi":"10.1145/3653305","DOIUrl":"https://doi.org/10.1145/3653305","url":null,"abstract":"<p>Benefiting from the effective exploration of the valuable topological pair-wise relationship of data points across multiple views, multi-view subspace clustering (MVSC) has received increasing attention in recent years. However, we observe that existing MVSC approaches still suffer from two limitations that need to be further improved to enhance the clustering effectiveness. Firstly, previous MVSC approaches mainly prioritize extracting multi-view consistency, often neglecting the cross-view discrepancy that may arise from noise, outliers, and view-inherent properties. Secondly, existing techniques are constrained by their reliance on pair-wise sample correlation and pair-wise view correlation, failing to capture the high-order correlations that are enclosed within multiple views. To address these issues, we propose a novel MVSC framework via joi<b>N</b>t cr<b>O</b>ss-view discrepancy disc<b>O</b>very an<b>D</b> high-order corre<b>L</b>ation d<b>E</b>tection (<b>NOODLE</b>), seeking an informative target subspace representation compatible across multiple features to facilitate the downstream clustering task. Specifically, we first exploit the self-representation mechanism to learn multiple view-specific affinity matrices, which are further decomposed into cohesive factors and incongruous factors to fit the multi-view consistency and discrepancy, respectively. Additionally, an explicit cross-view sparse regularization is applied to incoherent parts, ensuring the consistency and discrepancy to be precisely separated from the initial subspace representations. Meanwhile, the multiple cohesive parts are stacked into a three-dimensional tensor associated with a tensor-Singular Value Decomposition (t-SVD) based weighted tensor nuclear norm constraint, enabling effective detection of the high-order correlations implicit in multi-view data. Our proposed method outperforms state-of-the-art methods for multi-view clustering on six benchmark datasets, demonstrating its effectiveness.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"23 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140196302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representative and Back-In-Time Sampling from Real-World Hypergraphs","authors":"Minyoung Choe, Jaemin Yoo, Geon Lee, Woonsung Baek, U Kang, Kijung Shin","doi":"10.1145/3653306","DOIUrl":"https://doi.org/10.1145/3653306","url":null,"abstract":"<p>Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling subgraphs is useful for various purposes, including simulation, visualization, stream processing, representation learning, and crawling. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&A platforms) and thus are represented more naturally and accurately by hypergraphs than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of sampling from real-world hypergraphs, aiming to answer (Q1) how can we measure the goodness of sub-hypergraphs, and (Q2) how can we efficiently find a “good” sub-hypergraph. Regarding Q1, we distinguish between two goals: (a) <i>representative sampling</i>, which aims to capture the characteristics of the input hypergraph, and (b) <i>back-in-time sampling</i>, which aims to closely approximate a past snapshot of the input time-evolving hypergraph. To evaluate the similarity of the sampled sub-hypergraph to the target (i.e., the input hypergraph or its past snapshot), we consider 10 graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first conduct a thorough analysis of various intuitive approaches using 11 real-world hypergraphs, Then, based on this analysis, we propose <span>MiDaS</span> and <span>MiDaS-B</span>, designed for representative sampling and back-in-time sampling, respectively. Regarding representative sampling, we demonstrate through extensive experiments that <span>MiDaS</span>, which employs a sampling bias towards high-degree nodes in hyperedge selection, is (a) <b>Representative</b>: finding overall the most representative samples among 15 considered approaches, (b) <b>Fast</b>: several orders of magnitude faster than the strongest competitors, and (c) <b>Automatic</b>: automatically tuning the degree of sampling bias. Regarding back-in-time sampling, we demonstrate that <span>MiDaS-B</span> inherits the strengths of <span>MiDaS</span> despite an additional challenge—the unavailability of the target (i.e., past snapshot). It effectively handles this challenge by focusing on replicating universal evolutionary patterns, rather than directly replicating the target.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"26 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140167461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised Multi-view Clustering based on Nonnegative Matrix Factorization with Fusion Regularization","authors":"Guosheng Cui, Ruxin Wang, Dan Wu, Ye Li","doi":"10.1145/3653022","DOIUrl":"https://doi.org/10.1145/3653022","url":null,"abstract":"<p>Multi-view clustering has attracted significant attention and application. Nonnegative matrix factorization is one popular feature learning technology in pattern recognition. In recent years, many semi-supervised nonnegative matrix factorization algorithms are proposed by considering label information, which has achieved outstanding performance for multi-view clustering. However, most of these existing methods have either failed to consider discriminative information effectively or included too much hyper-parameters. Addressing these issues, a semi-supervised multi-view nonnegative matrix factorization with a novel fusion regularization (FRSMNMF) is developed in this paper. In this work, we uniformly constrain alignment of multiple views and discriminative information among clusters with designed fusion regularization. Meanwhile, to align the multiple views effectively, two kinds of compensating matrices are used to normalize the feature scales of different views. Additionally, we preserve the geometry structure information of labeled and unlabeled samples by introducing the graph regularization simultaneously. Due to the proposed methods, two effective optimization strategies based on multiplicative update rules are designed. Experiments implemented on six real-world datasets have demonstrated the effectiveness of our FRSMNMF comparing with several state-of-the-art unsupervised and semi-supervised approaches.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhan Wang, Qing Xie, Mengzi Tang, Lin Li, Jingling Yuan, Yongjian Liu
{"title":"A Dual Perspective Framework of Knowledge-correlation for Cross-domain Recommendation","authors":"Yuhan Wang, Qing Xie, Mengzi Tang, Lin Li, Jingling Yuan, Yongjian Liu","doi":"10.1145/3652520","DOIUrl":"https://doi.org/10.1145/3652520","url":null,"abstract":"<p>Recommender System provides users with online services in a personalized way. The performance of traditional recommender systems may deteriorate because of problems such as cold-start and data sparsity. Cross-domain Recommendation System utilizes the richer information from auxiliary domains to guide the task in the target domain. However, direct knowledge transfer may lead to a negative impact due to data heterogeneity and feature mismatch between domains. In this paper, we innovatively explore the cross-domain correlation from the perspectives of content semanticity and structural connectivity to fully exploit the information of Knowledge Graph. First, we adopt domain adaptation that automatically extracts transferable features to capture cross-domain semantic relations. Second, we devise a knowledge-aware graph neural network to explicitly model the high-order connectivity across domains. Third, we develop feature fusion strategies to combine the advantages of semantic and structural information. By simulating the cold-start scenario on two real-world datasets, the experimental results show that our proposed method has superior performance in accuracy and diversity compared with the SOTA methods. It demonstrates that our method can accurately predict users’ expressed preferences while exploring their potential diverse interests.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Zhang, Linhao Cai, Meng Chen, Xiucheng Li, Gao Cong
{"title":"DeepMeshCity: A Deep Learning Model for Urban Grid Prediction","authors":"Chi Zhang, Linhao Cai, Meng Chen, Xiucheng Li, Gao Cong","doi":"10.1145/3652859","DOIUrl":"https://doi.org/10.1145/3652859","url":null,"abstract":"<p>Urban grid prediction can be applied to many classic spatial-temporal prediction tasks such as air quality prediction, crowd density prediction, and traffic flow prediction, which is of great importance to smart city building. In light of its practical values, many methods have been developed for it and have achieved promising results. Despite their successes, two main challenges remain open: a) how to well capture the global dependencies and b) how to effectively model the multi-scale spatial-temporal correlations? To address these two challenges, we propose a novel method—<sans-serif>DeepMeshCity</sans-serif>, with a carefully-designed Self-Attention Citywide Grid Learner (<sans-serif>SA-CGL</sans-serif>) block comprising of a self-attention unit and a Citywide Grid Learner (<sans-serif>CGL</sans-serif>) unit. The self-attention block aims to capture the global spatial dependencies, and the <sans-serif>CGL</sans-serif> unit is responsible for learning the spatial-temporal correlations. In particular, a multi-scale memory unit is proposed to traverse all stacked <sans-serif>SA-CGL</sans-serif> blocks along a zigzag path to capture the multi-scale spatial-temporal correlations. In addition, we propose to initialize the single-scale memory units and the multi-scale memory units by using the corresponding ones in the previous fragment stack, so as to speed up the model training. We evaluate the performance of our proposed model by comparing with several state-of-the-art methods on four real-world datasets for two urban grid prediction applications. The experimental results verify the superiority of DeepMeshCity over the existing ones. The code is available at https://github.com/ILoveStudying/DeepMeshCity.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"19 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FulBM: Fast fully batch maintenance for landmark-based 3-hop cover labeling","authors":"Wentai Zhang, HaiHong E, HaoRan Luo, Mingzhi Sun","doi":"10.1145/3650035","DOIUrl":"https://doi.org/10.1145/3650035","url":null,"abstract":"<p>Landmark-based 3-hop cover labeling is a category of approaches for shortest distance/path queries on large-scale complex networks. It pre-computes an index offline to accelerate the online distance/path query. Most real-world graphs undergo rapid changes in topology, which makes index maintenance on dynamic graphs necessary. So far, the majority of index maintenance methods can handle only one edge update (either an addition or deletion) each time. To keep up with frequently changing graphs, we research the <b>ful</b><i>ly</i> <b>b</b><i>atch</i> <b>m</b><i>aintenance</i> problem for the 3-hop cover labeling, and proposed the method called <i>FulBM</i>. FulBM is composed of two algorithms: InsBM and DelBM, which are designed to handle batch edge insertions and deletions respectively. This separation is motivated by the insight that batch maintenance for edge insertions are much more time-efficient, and the fact that most edge updates in the real world are incremental. Both InsBM and DelBM are equipped with well-designed pruning strategies to minimize the number of vertex accesses. We have conducted comprehensive experiments on both synthetic and real-world graphs to verify the efficiency of FulBM and its variants for weighted graphs. The results show that our methods achieve 5.5 × to 228 × speedup compared with the state-of-the-art method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"169 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Node Embedding Preserving Graph Summarization","authors":"Houquan Zhou, Shenghua Liu, Huawei Shen, Xueqi Cheng","doi":"10.1145/3649505","DOIUrl":"https://doi.org/10.1145/3649505","url":null,"abstract":"<p>Graph summarization is a useful tool for analyzing large-scale graphs. Some works tried to preserve original node embeddings encoding rich structural information of nodes on the summary graph. However, their algorithms are designed heuristically and not theoretically guaranteed. In this paper, we theoretically study the problem of preserving node embeddings on summary graph. We prove that three matrix-factorization based node embedding methods of the original graph can be approximated by that of the summary graph, and propose a novel graph summarization method, named <span>HCSumm</span>, based on this analysis. Extensive experiments are performed on real-world datasets to evaluate the effectiveness of our proposed method. The experimental results show that our method outperforms the state-of-the-art methods in preserving node embeddings.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"60 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Content-Aware Influence Maximization via Online Learning to Rank","authors":"Konstantinos Theocharidis, Panagiotis Karras, Manolis Terrovitis, Spiros Skiadopoulos, Hady W. Lauw","doi":"10.1145/3651987","DOIUrl":"https://doi.org/10.1145/3651987","url":null,"abstract":"<p>How can we adapt the composition of a post over a series of rounds to make it more appealing in a social network? Techniques that progressively learn how to make a <i>fixed</i> post more influential over rounds have been studied in the context of the <i>Influence Maximization</i> (IM) problem, which seeks a set of <i>seed users</i> that maximize a post’s influence. However, there is no work on progressively learning how a post’s <i>features</i> affect its influence. In this paper, we propose and study the problem of <i>Adaptive Content-Aware Influence Maximization</i> (ACAIM), which calls to find <i>k</i> features to form a post in each round so as to maximize the cumulative influence of those posts over all rounds. We solve ACAIM by applying, for the first time, an <i>Online Learning to Rank</i> (OLR) framework to IM purposes. We introduce the CATRID <i>propagation model</i>, which expresses how posts disseminate in a social network using click probabilities and post visibility criteria, and develop a <i>simulator</i> that runs CATRID via a training-testing scheme based on real posts of the VK social network, so as to realistically represent the learning environment. We deploy three <i>learners</i> that solve ACAIM in an online (real-time) manner. We experimentally prove the practical suitability of our solutions via exhaustive experiments on multiple brands (operating as different <i>case studies</i>) and several VK datasets; the best learner is evaluated on 45 separate <i>case studies</i> yielding convincing results.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"57 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Derun Song, Enneng Yang, Guibing Guo, Li Shen, Linying Jiang, Xingwei Wang
{"title":"Multi-Scenario and Multi-Task Aware Feature Interaction for Recommendation System","authors":"Derun Song, Enneng Yang, Guibing Guo, Li Shen, Linying Jiang, Xingwei Wang","doi":"10.1145/3651312","DOIUrl":"https://doi.org/10.1145/3651312","url":null,"abstract":"<p>Multi-scenario and multi-task recommendation can use various feedback behaviors of users in different scenarios to learn users’ preferences and then make recommendations, which has attracted attention. However, the existing work ignores feature interactions and the fact that a pair of feature interactions will have differing levels of importance under different scenario-task pairs, leading to sub-optimal user preference learning. In this paper, we propose a <b>M</b>ulti-scenario and <b>M</b>ulti-task aware <b>F</b>eature <b>I</b>nteraction model, dubbed <b>MMFI</b>, to explicitly model feature interactions and learn the importance of feature interaction pairs in different scenarios and tasks. Specifically, MMFI first incorporates a pairwise feature interaction unit and a scenario-task interaction unit to effectively capture the interaction of feature pairs and scenario-task pairs. Then MMFI designs a scenario-task aware attention layer for learning the importance of feature interactions from coarse-grained to fine-grained, improving the model’s performance on various scenario-task pairs. More specifically, this attention layer consists of three modules: a fully shared bottom module, a partially shared middle module, and a specific output module. Finally, MMFI adapts two sparsity-aware functions to remove some useless feature interactions. Extensive experiments on two public datasets demonstrate the superiority of the proposed method over the existing multi-task recommendation, multi-scenario recommendation, and multi-scenario & multi-task recommendation models.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"33 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarwan Ali, Muhammad Ahmad, Maham Anwer Beg, Imdad Ullah Khan, Safiullah Faizullah, Muhammad Asad Khan
{"title":"SsAG: Summarization and Sparsification of Attributed Graphs","authors":"Sarwan Ali, Muhammad Ahmad, Maham Anwer Beg, Imdad Ullah Khan, Safiullah Faizullah, Muhammad Asad Khan","doi":"10.1145/3651619","DOIUrl":"https://doi.org/10.1145/3651619","url":null,"abstract":"<p>Graph summarization has become integral for managing and analyzing large-scale graphs in diverse real-world applications, including social networks, biological networks, and communication networks. Existing methods for graph summarization often face challenges, being either computationally expensive, limiting their applicability to large graphs, or lacking the incorporation of node attributes. In response, we introduce <span>SsAG</span>, an efficient and scalable lossy graph summarization method designed to preserve the essential structure of the original graph. <span>SsAG</span> computes a sparse representation (summary) of the input graph, accommodating graphs with node attributes. The summary is structured as a graph on supernodes (subsets of vertices of <i>G</i>), where weighted superedges connect pairs of supernodes. The methodology focuses on constructing a summary graph with <i>k</i> supernodes, aiming to minimize the reconstruction error (the difference between the original graph and the graph reconstructed from the summary) while maximizing homogeneity with respect to the node attributes. The construction process involves iteratively merging pairs of nodes. To enhance computational efficiency, we derive a closed-form expression for efficiently computing the reconstruction error (RE) after merging a pair, enabling constant-time approximation of this score. We assign a weight to each supernode, quantifying their contribution to the score of pairs, and utilize a weighted sampling strategy to select the best pair for merging. Notably, a logarithmic-sized sample achieves a summary comparable in quality based on various measures. Additionally, we propose a sparsification step for the constructed summary, aiming to reduce storage costs to a specified target size with a marginal increase in RE. Empirical evaluations across diverse real-world graphs demonstrate that <span>SsAG</span> exhibits superior speed, being up to 17 × faster, while generating summaries of comparable quality. This work represents a significant advancement in the field, addressing computational challenges and showcasing the effectiveness of <span>SsAG</span> in graph summarization.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"104 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}