Chonghao Chen , Jianming Zheng , Wanyu Chen , Xin Zhang , Yupu Guo , Aimin Luo , Fei Cai
{"title":"基于学习的社区搜索的级联多尺度图预训练与提示调优","authors":"Chonghao Chen , Jianming Zheng , Wanyu Chen , Xin Zhang , Yupu Guo , Aimin Luo , Fei Cai","doi":"10.1016/j.ipm.2025.104285","DOIUrl":null,"url":null,"abstract":"<div><div>Learning-based community search aims to identify the cohesive subgraph containing specified query nodes through embedding the hidden community pattern into node representations. Given the limited availability of labeled community samples, some approaches leverage the graph topological structure to train the graph encoder in a semi-supervised or unsupervised learning manner. However, the common training strategies can result in the learning biases such as conflicting community structures and distant member omission. Additionally, the lack of authentic and complete community examples as supervisory signals hinders model’s adaptation to specific tasks. To overcome these challenges, we propose Cascading Multi-Scale <strong>G</strong>raph <strong>P</strong>re-training and <strong>P</strong>rompt Tuning for <strong>C</strong>ommunity <strong>S</strong>earch (<strong>GPP-CS</strong>), which integrates comprehensive pre-training objectives and lightweight prompt tuning to facilitate the community-related knowledge learning. Specially, the multi-scale graph pre-training leverages combining context-aware and global-aware training strategies to mitigate biases in the community pattern learning, equipping the graph encoder with well-initialized weights. The cohesiveness-aware prompt tuning employs the center points of potential communities to initialize the prompt vectors, efficiently transferring pre-trained knowledge to specific tasks. Extensive experiments conducted on multiple benchmark datasets demonstrate that GPP-CS consistently outperforms state-of-the-art baselines regarding inference accuracy and efficiency. In particular, GPP-CS achieves average improvements of 11.82% and 16.99% over the best baseline in terms of F1-score in the inductive and hybrid settings, respectively. Furthermore, GPP-CS exhibits strong robustness in low-resource scenarios.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104285"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cascading multi-scale graph pre-training and prompt tuning for learning-based community search\",\"authors\":\"Chonghao Chen , Jianming Zheng , Wanyu Chen , Xin Zhang , Yupu Guo , Aimin Luo , Fei Cai\",\"doi\":\"10.1016/j.ipm.2025.104285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Learning-based community search aims to identify the cohesive subgraph containing specified query nodes through embedding the hidden community pattern into node representations. Given the limited availability of labeled community samples, some approaches leverage the graph topological structure to train the graph encoder in a semi-supervised or unsupervised learning manner. However, the common training strategies can result in the learning biases such as conflicting community structures and distant member omission. Additionally, the lack of authentic and complete community examples as supervisory signals hinders model’s adaptation to specific tasks. To overcome these challenges, we propose Cascading Multi-Scale <strong>G</strong>raph <strong>P</strong>re-training and <strong>P</strong>rompt Tuning for <strong>C</strong>ommunity <strong>S</strong>earch (<strong>GPP-CS</strong>), which integrates comprehensive pre-training objectives and lightweight prompt tuning to facilitate the community-related knowledge learning. Specially, the multi-scale graph pre-training leverages combining context-aware and global-aware training strategies to mitigate biases in the community pattern learning, equipping the graph encoder with well-initialized weights. The cohesiveness-aware prompt tuning employs the center points of potential communities to initialize the prompt vectors, efficiently transferring pre-trained knowledge to specific tasks. Extensive experiments conducted on multiple benchmark datasets demonstrate that GPP-CS consistently outperforms state-of-the-art baselines regarding inference accuracy and efficiency. In particular, GPP-CS achieves average improvements of 11.82% and 16.99% over the best baseline in terms of F1-score in the inductive and hybrid settings, respectively. Furthermore, GPP-CS exhibits strong robustness in low-resource scenarios.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 6\",\"pages\":\"Article 104285\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325002262\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002262","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
基于学习的社区搜索旨在通过将隐藏的社区模式嵌入到节点表示中来识别包含指定查询节点的内聚子图。鉴于标记社区样本的可用性有限,一些方法利用图拓扑结构以半监督或无监督学习的方式训练图编码器。然而,常见的训练策略会导致社区结构冲突和远程成员遗漏等学习偏差。此外,缺乏真实和完整的社区实例作为监督信号,阻碍了模型对特定任务的适应。为了克服这些挑战,我们提出了级联多尺度图预训练和社区搜索提示调优(Cascading多尺度图pretraining and Prompt Tuning for Community Search, GPP-CS),该方法将全面的预训练目标和轻量级的提示调优相结合,以促进社区相关知识的学习。特别地,多尺度图预训练结合了上下文感知和全局感知训练策略,以减轻社区模式学习中的偏差,为图编码器提供良好初始化的权值。内聚感知提示调优利用潜在群体的中心点初始化提示向量,有效地将预先训练好的知识转移到特定任务中。在多个基准数据集上进行的大量实验表明,GPP-CS在推理准确性和效率方面始终优于最先进的基线。特别是,GPP-CS在诱导性和混合性条件下的f1得分平均比最佳基线分别提高了11.82%和16.99%。此外,GPP-CS在低资源情况下表现出较强的鲁棒性。
Cascading multi-scale graph pre-training and prompt tuning for learning-based community search
Learning-based community search aims to identify the cohesive subgraph containing specified query nodes through embedding the hidden community pattern into node representations. Given the limited availability of labeled community samples, some approaches leverage the graph topological structure to train the graph encoder in a semi-supervised or unsupervised learning manner. However, the common training strategies can result in the learning biases such as conflicting community structures and distant member omission. Additionally, the lack of authentic and complete community examples as supervisory signals hinders model’s adaptation to specific tasks. To overcome these challenges, we propose Cascading Multi-Scale Graph Pre-training and Prompt Tuning for Community Search (GPP-CS), which integrates comprehensive pre-training objectives and lightweight prompt tuning to facilitate the community-related knowledge learning. Specially, the multi-scale graph pre-training leverages combining context-aware and global-aware training strategies to mitigate biases in the community pattern learning, equipping the graph encoder with well-initialized weights. The cohesiveness-aware prompt tuning employs the center points of potential communities to initialize the prompt vectors, efficiently transferring pre-trained knowledge to specific tasks. Extensive experiments conducted on multiple benchmark datasets demonstrate that GPP-CS consistently outperforms state-of-the-art baselines regarding inference accuracy and efficiency. In particular, GPP-CS achieves average improvements of 11.82% and 16.99% over the best baseline in terms of F1-score in the inductive and hybrid settings, respectively. Furthermore, GPP-CS exhibits strong robustness in low-resource scenarios.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.