{"title":"TEDRec: Transformer-based scientific collaborator recommendation via textual-edge dynamic network modeling","authors":"Keqin Guan , Weiye Huang , Ting Chen , Wai Kin (Victor) Chan","doi":"10.1016/j.ipm.2025.104283","DOIUrl":"10.1016/j.ipm.2025.104283","url":null,"abstract":"<div><div>In academia, scientific cooperation enhances the quality of research and strengthens scholars’ profiles. However, the information overload in the big data era poses challenges in recommending suitable collaborators for scholars or projects. Most existing collaborator recommendation methods prioritize structural dependencies and neglect the temporal details and semantic information. In this study, we leverage the Transformer’s self-attention mechanism to develop a novel academic collaborator recommendation framework using the textual-edge dynamic network called TEDRec. It comprehensively analyzes the collaboration network through three key aspects (i.e., temporality, textual edges, and structural relationships) to learn more valuable author representations, thereby providing scholars with the most suitable partners for future research or projects. Subsequently, several empirical experiments are conducted on three constructed datasets to evaluate their effectiveness. The results validate that the proposed framework does achieve superb performance over baseline models across all evaluation metrics, indicating excellent generalization and robustness.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104283"},"PeriodicalIF":7.4,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144655655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A method for noise-suppressed multimodal feature integration in urban scene detection","authors":"Xue-juan Han , Zhong Qu , Shu-fang Xia","doi":"10.1016/j.ipm.2025.104290","DOIUrl":"10.1016/j.ipm.2025.104290","url":null,"abstract":"<div><div>The complementary imaging properties of visible and thermal infrared make them play a crucial role in multimodal object detection. Multimodal fusion methods that do not effectively deal with intra-modal and inter-modal noise interference can lead to degraded detection performance. To address this problem, we propose a generic multimodal object detection architecture. The noise within the input feature modality is first weakened by the Noise Suppression and Score-guided Fusion module (NSSFuse), while the intra-modal and inter-modal feature representations are enriched, thus facilitating the global interaction of multimodal features. Then the multimodal low-frequency features and high-frequency features are efficiently fused by the Multimodal Frequency Fusion module (MutiFreqFuse), which retains the key information while suppressing the inter-modal irrelevant noise to further enhance the multimodal feature fusion. Numerous experimental results validate the superiority of the model on the benchmark datasets, Multi-Modal Multi-Feature for Traffic Detection (M3FD) and Forward-Looking InfraRed (FLIR). The mean Average Precision (<em>mAP</em>) improves by 4.4–6.8% over the baseline models and is up to 6.3% higher than that of the most recent multimodal models.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104290"},"PeriodicalIF":7.4,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144655641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SBlur: An Obfuscation Approach for Preserving Sensitive Attributes in Recommender System","authors":"Falguni Roy , Na Zhao , Xiaofeng Ding","doi":"10.1016/j.ipm.2025.104282","DOIUrl":"10.1016/j.ipm.2025.104282","url":null,"abstract":"<div><div>User interaction in the recommender system is treated as a way of expressing user preferences, which later serve as input to provide more accurate recommendations. However, such interaction data can be exploited to infer user private attributes, including gender, age, and personality traits, posing significant privacy implications. Existing obfuscation-based approaches endeavor to mitigate these vulnerabilities by adding or removing interactions from user profiles before or during recommender algorithm training. Nevertheless, these methods often compromise recommendation accuracy while facing challenges such as the cold-start user problem and the “rich get richer” effect, undermining recommendation diversity. To address these constraints, we propose SBlur, a strategic obfuscation approach designed to preserve users’ attribute privacy while balancing the privacy-accuracy-fairness trade-off and enhancing diversity. SBlur conceals gender inference attacks by strategically adding and removing items, supported by a combined similarity measure that integrates rating-based and genre preference-based similarities. This combined similarity enables precise user profile personalization for obfuscation, particularly in cold-start scenarios. We evaluate SBlur using three popular datasets (ML100k, ML1M, and Yahoo!Movie) and three state-of-the-art recommendation algorithms (UserKNN, ALS, and BPRMF). Experimental results demonstrate that SBlur achieves a balanced trade-off between privacy, recommendation accuracy, and fairness while promoting recommendation diversity.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104282"},"PeriodicalIF":7.4,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chonghao Chen , Jianming Zheng , Wanyu Chen , Xin Zhang , Yupu Guo , Aimin Luo , Fei Cai
{"title":"Cascading multi-scale graph pre-training and prompt tuning for learning-based community search","authors":"Chonghao Chen , Jianming Zheng , Wanyu Chen , Xin Zhang , Yupu Guo , Aimin Luo , Fei Cai","doi":"10.1016/j.ipm.2025.104285","DOIUrl":"10.1016/j.ipm.2025.104285","url":null,"abstract":"<div><div>Learning-based community search aims to identify the cohesive subgraph containing specified query nodes through embedding the hidden community pattern into node representations. Given the limited availability of labeled community samples, some approaches leverage the graph topological structure to train the graph encoder in a semi-supervised or unsupervised learning manner. However, the common training strategies can result in the learning biases such as conflicting community structures and distant member omission. Additionally, the lack of authentic and complete community examples as supervisory signals hinders model’s adaptation to specific tasks. To overcome these challenges, we propose Cascading Multi-Scale <strong>G</strong>raph <strong>P</strong>re-training and <strong>P</strong>rompt Tuning for <strong>C</strong>ommunity <strong>S</strong>earch (<strong>GPP-CS</strong>), which integrates comprehensive pre-training objectives and lightweight prompt tuning to facilitate the community-related knowledge learning. Specially, the multi-scale graph pre-training leverages combining context-aware and global-aware training strategies to mitigate biases in the community pattern learning, equipping the graph encoder with well-initialized weights. The cohesiveness-aware prompt tuning employs the center points of potential communities to initialize the prompt vectors, efficiently transferring pre-trained knowledge to specific tasks. Extensive experiments conducted on multiple benchmark datasets demonstrate that GPP-CS consistently outperforms state-of-the-art baselines regarding inference accuracy and efficiency. In particular, GPP-CS achieves average improvements of 11.82% and 16.99% over the best baseline in terms of F1-score in the inductive and hybrid settings, respectively. Furthermore, GPP-CS exhibits strong robustness in low-resource scenarios.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104285"},"PeriodicalIF":7.4,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinhong Li , Hanwen Qu , Chen Chen , Xiaoyi Lv , Enguang Zuo , Kui Wang , Xulun Cai
{"title":"TreeXformer: Extracting tabular feature-context information using tree-structured semantics","authors":"Yinhong Li , Hanwen Qu , Chen Chen , Xiaoyi Lv , Enguang Zuo , Kui Wang , Xulun Cai","doi":"10.1016/j.ipm.2025.104291","DOIUrl":"10.1016/j.ipm.2025.104291","url":null,"abstract":"<div><div>Tabular classification learning aims to support decision-making in fields such as finance and recommendation systems by processing various types of structured features in tabular data. Most existing models rely on the multi-layer non-linear structures of deep neural networks to automatically extract feature interactions. However, the heterogeneity of tabular features often leads to the neglect of feature-context information, resulting in redundant or insufficient interactions that degrade model performance. Enhancing the modeling of contextual relationships between features can improve the model’s ability to interpret heterogeneous features effectively. To address this, we propose the TreeXformer model, a customized Transformer network that introduces, for the first time, an abstract tree-structured semantic representation to capture feature-context information. We develop a Tree Graph Estimator (TGE) to construct the tree-structured semantics of features and employ the Guided Interaction Attention (GIA) to facilitate feature interactions. A mean operation is applied across feature dimensions to aggregate global semantic information, improving the model’s interpretability and enhancing the transparency of its decision-making process. Extensive experiments on five public datasets and one private dataset demonstrate that TreeXformer significantly improves model performance, proving its effectiveness and superiority in capturing complex feature relationships. Ultimately, TreeXformer not only enhances classification outcomes but also strengthens model interpretability.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104291"},"PeriodicalIF":7.4,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyan Wan , Xinyu He , Yang Deng , Bangchao Wang
{"title":"A systematic mapping study of information retrieval-based requirements traceability methods","authors":"Hongyan Wan , Xinyu He , Yang Deng , Bangchao Wang","doi":"10.1016/j.ipm.2025.104287","DOIUrl":"10.1016/j.ipm.2025.104287","url":null,"abstract":"<div><div>Requirements traceability (RT) is critical for ensuring consistency, quality, and maintainability in software development. While learning-based approaches have gained increasing attention, traditional information retrieval (IR) methods remain widely used in practice. However, existing literature lacks a systematic synthesis of their best practices and recent advancements. To address this gap, we conducted a systematic mapping study (SMS) of 40 primary studies published between 2014 and 2024, selected from an initial pool of 2,052 publications. Our review examines widely adopted IR models, enhancement strategies, evaluation datasets, performance metrics, and baseline methods. Specifically, we identify and categorize 32 representative enhancement strategies into four methodological types: (1) artifact text information, (2) artifact structural information, (3) model-based optimization, and (4) human intervention. Furthermore, we analyze 53 commonly used datasets and 9 evaluation metrics for validation. Our findings indicate that among various IR models, the Vector Space Model (VSM) and Latent Semantic Indexing (LSI) typically achieve stronger performance in RT tasks. This study provides a comprehensive synthesis of IR-based RT research and offers practical insights to advance traceability in software engineering.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104287"},"PeriodicalIF":7.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing long-form question answering via reflection with question decomposition","authors":"Junjie Xiao , Wei Wu , Jiaxu Zhao , Meng Fang , Jianxin Wang","doi":"10.1016/j.ipm.2025.104274","DOIUrl":"10.1016/j.ipm.2025.104274","url":null,"abstract":"<div><div>Long-Form Question Answering (LFQA) requires multi-paragraph responses that explain, contextualize and justify an answer rather than returning a single fact. Large proprietary language models can meet this bar, but privacy, cost and hardware limits often force practitioners to rely on much smaller, locally hosted models — whose outputs are typically shallow or incomplete. We introduce Decomposition-Reflection, a training-free prompting framework that (i) decomposes a user question into the complementary sub-questions, (ii) answers each one, and (iii) runs a lightweight self-reflection loop after every stage to enhance the comprehensiveness, entailment and factuality of the results before synthesizing the final response. Across three LFQA benchmarks, the proposed approach raises ROUGE and LLM-based factuality scores over strong chain-of-thought and self-refinement baselines. Ablation study confirms that removing either decomposition or reflection sharply degrades coverage and entailment, underscoring the importance of both components.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104274"},"PeriodicalIF":7.4,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144587679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefano Marchesin , Gianmaria Silvello , Omar Alonso
{"title":"Large Language Models and Data Quality for Knowledge Graphs","authors":"Stefano Marchesin , Gianmaria Silvello , Omar Alonso","doi":"10.1016/j.ipm.2025.104281","DOIUrl":"10.1016/j.ipm.2025.104281","url":null,"abstract":"<div><div>Knowledge Graphs (KGs) have become essential for applications such as virtual assistants, web search, reasoning, and information access and management. Prominent examples include Wikidata, DBpedia, YAGO, and NELL, which large companies widely use for structuring and integrating data. Constructing KGs involves various AI-driven processes, including data integration, entity recognition, relation extraction, and active learning. However, automated methods often lead to sparsity and inaccuracies, making rigorous KG quality evaluation crucial for improving construction methodologies and ensuring reliable downstream applications. Despite its importance, large-scale KG quality assessment remains an underexplored research area.</div><div>The rise of Large Language Models (LLMs) introduces both opportunities and challenges for KG construction and evaluation. LLMs can enhance contextual understanding and reasoning in KG systems but also pose risks, such as introducing misinformation or “hallucinations” that could degrade KG integrity. Effectively integrating LLMs into KG workflows requires robust quality control mechanisms to manage errors and ensure trustworthiness.</div><div>This special issue explores the intersection of KGs and LLMs, emphasizing human–machine collaboration for KG construction and evaluation. We present contributions on LLM-assisted KG generation, large-scale KG quality assessment, and quality control mechanisms for mitigating LLM-induced errors. Topics covered include KG construction methodologies, LLM deployment in KG systems, scalable KG evaluation, human-in-the-loop approaches, domain-specific applications, and industrial KG maintenance. By advancing research in these areas, this issue fosters innovation at the convergence of KGs and LLMs.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104281"},"PeriodicalIF":6.9,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based centrality framework for effective multi-video summarization","authors":"Aziz Qaroush, Mohammad Jubran, Qutaiba Olayan","doi":"10.1016/j.ipm.2025.104276","DOIUrl":"10.1016/j.ipm.2025.104276","url":null,"abstract":"<div><div>The exponential growth of video content presents substantial challenges in summarizing and retrieving relevant information, particularly in multi-video scenarios involving heterogeneous sources. This paper presents an unsupervised, graph-based centrality framework for multi-video summarization. Segment representations are extracted using 3D Convolutional Neural Networks (3DCNNs) to capture both spatial and temporal features. We introduce three novel ranking algorithms — Weighted Degree Centrality (WDC), V-Rank, and VL-Rank — extending classical methods such as Degree Centrality, PageRank, and LexRank. These algorithms incorporate visual saliency, motion, and semantic similarity to ensure relevance, diversity, and structural representativeness. The framework comprises four stages: segmentation, graph construction, ranking, and selection. We provide a detailed computational analysis, including time complexity and convergence behavior. VL-Rank achieves significantly faster convergence than PageRank through a normalized propagation scheme, while WDC offers a highly efficient, non-iterative alternative. Evaluations on the Tour20 dataset demonstrate that the proposed methods outperform state-of-the-art approaches, with WDC achieving a mean F1 score of 0.741 compared to 0.680 for the Multi-Stream baseline. The framework is both effective and scalable, making it suitable for large-scale or real-time applications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104276"},"PeriodicalIF":7.4,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}