{"title":"A pseudo-labeling approach based on knowledge distillation for graph few-shot learning","authors":"Zongqian Wu , Peng Zhou , Guoqiu Wen , Xiaofeng Zhu","doi":"10.1016/j.ipm.2025.104268","DOIUrl":"10.1016/j.ipm.2025.104268","url":null,"abstract":"<div><div>Graph-based few-shot node classification (FSNC) has emerged as a promising solution to the challenge of limited labeled nodes in complex network analysis. Although existing pseudo-labeling FSNC methods have shown encouraging results, they often struggle with wrong or over-confident pseudo-labels, which can negatively impact model generalization. To overcome these limitations, we propose PLD-FSNC, a novel pseudo-labeling FSNC framework leveraging knowledge distillation. Our PLD-FSNC framework is composed of two modules, <em>i.e.</em>, embedding transfer and pseudo-label improvement. The embedding transfer module transfers knowledge from a pre-trained source model to a target model, enhancing pseudo-label selection quality. The pseudo-label improvement module mitigates the impact of wrong and over-confident pseudo-labels by using soft labels from the source model to supervise the target model’s predictions. We also provide theoretical justification for our pseudo-label improvement module and demonstrate its effectiveness through extensive experiments on six real-world datasets.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104268"},"PeriodicalIF":7.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinqi Chen , Yangting Zheng , Peiwen Li , Weijian Luo , Wenbin He , Shuo Kang
{"title":"Liberating the expressive capacity of deep hashing for image retrieval","authors":"Yinqi Chen , Yangting Zheng , Peiwen Li , Weijian Luo , Wenbin He , Shuo Kang","doi":"10.1016/j.ipm.2025.104277","DOIUrl":"10.1016/j.ipm.2025.104277","url":null,"abstract":"<div><div>Deep hashing has become prominent in the field of image retrieval due to its efficiency. However, existing approaches often assume hash bits to be independent and identically distributed (i.i.d.), and they typically seek to balance metric and quantization concerns to enhance efficiency. These outdated concepts cause the hash code to degrade during the training process and weaken its ability to represent the number of categories from a modeling perspective. Overall, these practices limit the expressive capacity of the models. Thus, we propose <strong>L</strong>iberated <strong>E</strong>xpressive <strong>C</strong>apacity <strong>H</strong>ashing (LECH), a novel deep hashing framework that introduces two new concepts: 1) Symbolic representation, which aligns the Hamming space (+1/-1) with the positive and negative fields in real number space, representing the Hamming distance as the sign of the product; and 2) Bit correlation, which models dependencies between adjacent hash bits. According to above concepts, LECH adopts symbolic representation in output hash code and distance calculation of Hamming, introduces a module employing multiplication operation to check symbols, and models metric learning of hashing process as a conditional random field based on bit correlation. In LECH, the absence of quantization in the symbolic representation framework enables deep learning to fully learn the discriminative ability without the dilemma of conflicting quantization and metric, and also the incorporation of bit correlation expands the diversity of categories. These enhancements significantly amplify the expressive capacity of deep hashing. Experiments on benchmark datasets (e.g., ImageNet, MS-COCO, NUS-WIDE) demonstrate that LECH achieves a 3.05 % improvement in mean average precision (mAP) over baseline, with the highest results 90.2 % on ImageNet, 88.6 % on MS-COCO, and 85.4 % on NUS-WIDE under 64bit setting compared to state-of-the-art methods. These results highlight LECH’s superior performance and its potential to advance deep hashing for large-scale image retrieval.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104277"},"PeriodicalIF":7.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenli Chen , Jie Hao , Haixia Sun , Liang Zhao , Jiao Li , Qing Qian , Qinglong Peng , Xuwen Wang , Shan Cong , Liu Shen , Zhen Guo , Siyue Pu , Yan Lin
{"title":"MedScaleRE-PF: a prompt-based framework with retrieval-augmented generation, chain-of-thought, and self-verification for scale-specific relation extraction in Chinese medical literature","authors":"Zhenli Chen , Jie Hao , Haixia Sun , Liang Zhao , Jiao Li , Qing Qian , Qinglong Peng , Xuwen Wang , Shan Cong , Liu Shen , Zhen Guo , Siyue Pu , Yan Lin","doi":"10.1016/j.ipm.2025.104278","DOIUrl":"10.1016/j.ipm.2025.104278","url":null,"abstract":"<div><div>Large language models have shown promise in biomedical natural language processing, yet their use in extracting structured knowledge from medical scales remains limited. This study introduces MedScaleRE-PF, a novel prompting framework designed for relation extraction in Chinese medical scale texts. The framework combines few-shot in-context learning with retrieval-augmented generation, chain-of-thought prompting, and self-verification strategies to improve contextual understanding and factual consistency. We constructed the CMedS-RE dataset, consisting of 606 full-text articles with 19,051 sentences, 29,359 annotated entities, and 7217 relation instances. Experiments were conducted on two tasks: relational triple extraction (RTE) and relation classification (RC). We evaluated both single-step and multi-step prompting, along with four self-verification strategies: direct (D-SV), stepwise (S-CoT-SV), relation-specific (R-CoT-SV), and stepwise relation-specific (SR-CoT-SV). The best results were achieved with single-step prompting and the R-CoT-SV strategy, yielding F1 scores of 42.58 % for RTE under the 32-shot setting and 65.42 % for RC under the 8-shot setting. Compared to a RAG-only baseline, this configuration improved F1 by 7.59 % on RTE and 1.07 % on RC. Additional experiments demonstrated strong performance under annotation-scarce conditions, achieving 46.99 % F1 on RTE with 20 training articles and 59.87 % on RC with 50 articles. Ablation and error analyses further confirmed that task-specific prompt structure and verification design significantly impact performance under few-shot conditions. MedScaleRE-PF also showed consistent results across multiple LLMs, confirming its stability and generalizability. These findings highlight the effectiveness of combining simple prompting and CoT-inspired verification in domain-specific information extraction. MedScaleRE-PF offers a flexible and structured approach for mining medical scale knowledge and supports prompt-based development in biomedical applications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104278"},"PeriodicalIF":7.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypergraph negative influence blocking maximization via influence estimation","authors":"Xian-Jie Zhang , Xiao-Ming Zhang , Xu-Dong Huang , Hai-Feng Zhang","doi":"10.1016/j.ipm.2025.104273","DOIUrl":"10.1016/j.ipm.2025.104273","url":null,"abstract":"<div><div>The rapid advancement of internet technology has accelerated the spread of negative information. To block the impact of such diffusion, selecting a set of positive seed nodes for competitive propagation is a viable strategy. However, existing studies have predominantly focused on pairwise user interactions, neglecting group relationships. To address this gap, we employ a hypergraph to model higher-order group interactions, defining the hypergraph influence blocking maximization (HIBM) problem and proposing the hypergraph competitive susceptible–infected (HCSI) diffusion model. We then prove the monotonicity and submodularity of the objective function, enabling the development of a greedy algorithm. To address the high computational complexity of the greedy approach, we further propose an efficient heuristic method, hypergraph competitive influence estimation (HCIE), which leverages a probabilistic formula to estimate the influence of negative seed sets under the blocking effect of positive nodes, thereby selecting positive nodes to minimize the propagation of negative information. Experimental results demonstrate that the HCIE method reduces the diffusion of negative information by over 10% on certain datasets, showcasing its effectiveness and robustness. Additionally, the HCIE method achieves performance close to that of the greedy algorithm while significantly reducing computational time.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104273"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zishun Ni , Hang Cheng , Jiaoling Chen , Yongliang Xu , Fei Chen , Meiqing Wang
{"title":"NiNet: A new invertible neural network architecture more suitable for deep image hiding","authors":"Zishun Ni , Hang Cheng , Jiaoling Chen , Yongliang Xu , Fei Chen , Meiqing Wang","doi":"10.1016/j.ipm.2025.104275","DOIUrl":"10.1016/j.ipm.2025.104275","url":null,"abstract":"<div><div>Image hiding through the application of invertible neural network (INN) represents a significant branch within the realm of deep image hiding methodologies, characterized by a compact network architecture and a streamlined parameter count. Nonetheless, when juxtaposed with autoencoder-based approaches, existing INN methods often result in inferior image quality. To surmount this challenge, this paper introduces a novel masking-based image hiding paradigm, establishes a new spatial domain transformation for images, and refines the Swin-transformer block. By integrating these innovations, an INN architecture is crafted that is particularly adept for deep image hiding, termed NiNet. The experimental results demonstrate that NiNet can remarkably address the problem of image hiding. In terms of steganographic image quality, NiNet outperforms the current state-of-the-art (SOTA) algorithms by 0.26 dB on the DIV2K dataset, 1.49 dB on the COCO dataset, and 0.39 dB on the ImageNet dataset. Regarding the quality of secret image recovery, NiNet surpasses the SOTA algorithms by 2.06 dB on DIV2K, 1.98 dB on COCO, and 0.50 dB on ImageNet.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104275"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The role of recommendation algorithms in the formation of disinformation networks","authors":"Pau Muñoz , Raúl Barba-Rojas , Fernando Díez , Alejandro Bellogín","doi":"10.1016/j.ipm.2025.104243","DOIUrl":"10.1016/j.ipm.2025.104243","url":null,"abstract":"<div><div>Disinformation on social networks, especially those that share media content, remains a critical issue with far-reaching societal implications. Although extensive research has addressed the prevalence and mitigation of false information, the specific impact of recommendation algorithms on the creation and consolidation of disinformation networks has not been thoroughly examined. In this work, we bridge this gap by simulating how various recommendation techniques — ranging from basic yet foundational approaches such as popularity-based and content-based methods — shape network dynamics and facilitate disinformation spread. These classical algorithms are essential building blocks of modern hybrid and task-specific recommender systems; understanding their effects is thus crucial for assessing systemic risks. Using a dataset comprising tweets from 275 disinformation agents and 275 legitimate journalism agents, we conduct a realistic simulation grounded in probabilistic click models of user behavior and real-world social media data. Our findings reveal that certain recommendation approaches can significantly reinforce the cohesion and visibility of disinformation networks, thereby amplifying their reach. These results underscore the necessity for algorithmic accountability and the design of ethically responsible recommender systems to maintain information integrity on social platforms.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104243"},"PeriodicalIF":7.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tracing science-technology-linkages: A machine learning pipeline for extracting and matching patent in-text references to scientific publications","authors":"Zahra Abbasiantaeb , Suzan Verberne , Jian Wang","doi":"10.1016/j.ipm.2025.104264","DOIUrl":"10.1016/j.ipm.2025.104264","url":null,"abstract":"<div><div>Patent references to science provide a valuable paper trail for investigating the knowledge flow from science to technological innovation. Research on patent–paper links has mostly concentrated on front-page references, often neglecting the more complex in-text references. Therefore, we developed a three-stage machine-learning pipeline to extract and match patent in-text references to scientific publications. Our pipeline performs the following tasks: (1) extracting reference strings from patent texts, (2) parsing fields from these reference strings, and (3) matching references to publications in the Web of Science (WoS) database. We developed a training dataset consisting of 3,900 (and 3,901) manually annotated references from 392 (and 319) randomly selected EPO (and USPTO) patents. The first stage, reference extraction, achieved almost perfect results with a precision of 98.9% and a recall of 97.7% at the reference level. Overall, the pipeline demonstrated robust performance, with a precision of 96.8% and a recall of 91.9% at the unique patent-paper-pair level. Applying this pipeline to EPO and USPTO patents granted between 1990 and 2022, we identified 5,438,836 (and 20,432,189) references from 492,469 (and 1,449,398) EPO (and USPTO) patents, 2,763,779 (and 11,069,995) of which are matched to WoS publications. This extensive dataset is a valuable resource for studying science-technology linkages. We offer open access to this dataset, along with the associated code and training data.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104264"},"PeriodicalIF":7.4,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Xiong , Gengfeng Zheng , Xiao Ma , Chunlin Li , Jiangfeng Zeng
{"title":"DelphiAgent: A trustworthy multi-agent verification framework for automated fact verification","authors":"Cheng Xiong , Gengfeng Zheng , Xiao Ma , Chunlin Li , Jiangfeng Zeng","doi":"10.1016/j.ipm.2025.104241","DOIUrl":"10.1016/j.ipm.2025.104241","url":null,"abstract":"<div><div>Large Language Models (LLMs) have been investigated for many reasoning-intensive tasks including fact verification and exhibited outstanding performance via coupling LLM’s internal and external knowledge. However, non-agentic LLM-based methods produce responses based on direct prompts in an one-off manner, suffering from challenges in factuality and hallucinations. In this paper, we propose DelphiAgent, an innovative agentic framework for trustworthy fact-checking that employs multiple LLMs to emulate the workflow of the Delphi method, aiming at enhancing transparency in the decision-making procedure and mitigating hallucinations when generating justifications. This is implemented through a duel-system framework that integrates the evidence mining module and the Delphi decision-making module. The evidence mining module extracts evidence from raw uncensored reports and refines evidence, ensuring the provision of instructive rationales for the subsequent module. Meanwhile, drawing inspiration from the Delphi method, the decision-making module devises multiple LLM-based agents with distinct personalities to make factuality judgments individually based on the claim and its verified evidence, and reaches a consensus through multiple rounds of feedback and synthesis. The experimental findings from two challenging datasets indicate that DelphiAgent not only surpasses current LLM-based approaches but also is on par with state-of-the-art LLM-enhanced supervised baselines without necessitating a training regime, with macF1 improvements reaching up to 6.84% on RAWFC and comparable performance on LIAR-RAW. Furthermore, the generated justifications throughout the workflow underscore the trustworthiness of our proposed framework. The official implementation of this paper is available at <span><span>https://github.com/zjfgh2015/DelphiAgent</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104241"},"PeriodicalIF":7.4,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrik Dokoupil , Ludovico Boratto , Ladislav Peška
{"title":"Accuracy and beyond-accuracy perspectives of controllable multi-objective recommender systems","authors":"Patrik Dokoupil , Ludovico Boratto , Ladislav Peška","doi":"10.1016/j.ipm.2025.104267","DOIUrl":"10.1016/j.ipm.2025.104267","url":null,"abstract":"<div><div>Multi-objective recommender systems (MORS) aim to optimize multiple quality criteria while generating recommendations. This opens the opportunity of letting <em>individual</em> users control their recommendations by specifying their propensities towards the considered objectives. However, to grasp this opportunity, the propensity feedback has to be interpretable by MORS and respected in the resulting recommendations. This paper presents the results of a user study (208 valid participants) that assessed the performance of nine MORS algorithms tailored to individual preferences, where users were allowed to set their propensities via a dedicated interface. In the analysis, we focused on how individual algorithms performed w.r.t. accuracy and beyond-accuracy metrics, whether participants utilized the possibility to adapt algorithms, and to what extent MORS respected these propensities. The findings indicate that while relevance-based recommendations often outperform MORS w.r.t. short-term consumption, MORS variants maintain higher values of beyond-accuracy metrics. This includes improvements w.r.t. serendipity, which in turn had a positive impact on the overall user satisfaction with the provided recommendations. Moreover, we observed that more complex and more time-consuming evolutionary MORS algorithms do not bring any benefits w.r.t. proportionality towards the user’s propensities, as compared to simpler greedy or item-wise approaches.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104267"},"PeriodicalIF":7.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Zhu , Xiao Hu , Han Wu , Chuan Qin , Hengshu Zhu , Hui Xiong
{"title":"Enhancing job recommendations with LLM-based resume completion: A behavior-denoised alignment approach","authors":"Chen Zhu , Xiao Hu , Han Wu , Chuan Qin , Hengshu Zhu , Hui Xiong","doi":"10.1016/j.ipm.2025.104261","DOIUrl":"10.1016/j.ipm.2025.104261","url":null,"abstract":"<div><div>Personalized job recommender system is pivotal in connecting job-seekers with suitable jobs in online recruitment services, significantly impacting the efficiency of the recruitment process. Despite the remarkable progress of existing job recommendation algorithms, they encounter challenges due to the delayed availability of updated resumes, leading to intricacies in profiling job-seeker preferences. Indeed, recent advancements in Large Language Models (LLMs) for text generation offer a straightforward solution through automatic resume completion for job-seekers. This enables a model-agnostic method to tackle the problem of outdated resumes in job recommendations. To this end, here we propose a user behavior-based preference alignment framework for fine-tuning LLMs to benefit job recommendations through resume completion. Meanwhile, to mitigate noise influence in behavioral data (i.e., bias and variance), we creatively propose a noise-robust LLM alignment method, named Denoised Direct Preference Optimization (Denoised DPO). This method can effectively disentangle genuine user preferences from noisy behavioral data. Specifically, we first design a novel reward function for preference estimation by combining an LLM-based component for real user preference with a regression model for bias disentanglement. Moreover, we develop a Thurstonian-style model for job-seekers’ preference modeling to stabilize data reliability amidst behavior variances. Finally, to evaluate our approach, we have conducted extensive offline/online experiments. In offline experiments, we especially have constructed real-world datasets, which contains more than 1 million users, 40 million jobs, and 500 million interaction records, from one of the largest recruitment platforms in China. And we evaluate our methods on two classic recommendation paradigms and the results proves our method can bring 1%–6% improvement among various metrics (i.e. AUC, NDCG, HR, and MAP). In online experiments, we deployed our method on this recruitment platform for one week. The results show that compared with the baseline, our method has achieved 7.52% improvement in click per user and 13.29% improvement in conversion per user.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 6","pages":"Article 104261"},"PeriodicalIF":7.4,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}