Information Processing & Management最新文献_第6页

Unsupervised contrastive domain adaptive rumor detection with test-time classifier adjustment 基于测试时间分类器调整的无监督对比域自适应谣言检测

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-29 DOI: 10.1016/j.ipm.2025.104341

Hongyan Ran , Di Zhang , Xiaohong Li , Huifang Ma , Caiyan Jia , Yaogong Feng

{"title":"Unsupervised contrastive domain adaptive rumor detection with test-time classifier adjustment","authors":"Hongyan Ran , Di Zhang , Xiaohong Li , Huifang Ma , Caiyan Jia , Yaogong Feng","doi":"10.1016/j.ipm.2025.104341","DOIUrl":"10.1016/j.ipm.2025.104341","url":null,"abstract":"<div><div>Domain-adaptive rumor detection faces significant challenges in mitigating distributional shifts between the source and target domains. Although contrastive learning-based models have shown promise, they exhibit two fundamental shortcomings. Firstly, neglecting the impact of source content on feature alignment may hinder discriminative feature learning. Secondly, relying on unbiased classifier assumptions despite inherent distributional discrepancies in target data. To address these challenges, we propose a novel method called Unsupervised <u>C</u>ontrastive <u>D</u>omain Adaptive Rumor Detection with <u>T</u>est-<u>T</u>ime Classifier Adjustment (CDTT). Our contrastive domain adaptation framework utilizes a stance-based contrastive learning mechanism to align latent stance features across domains while maintaining content independence. Additionally, to address label unavailability in the target domain, we devise a pseudo-label generation strategy that aggregates nearest-neighbor probabilities through feature-space distance-based batch soft voting. Finally, we implement a test-time adaptation strategy that refines the source-trained classifier by constructing class-wise pseudo-prototypes from unlabeled target data and optimizing prediction through distance-based sample classification. Extensive experiments conducted on four groups of cross-domain datasets and a cross-event dataset showcase that our model surpasses the state-of-the-art baselines.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104341"},"PeriodicalIF":6.9,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging dynamic few-shot prompting and ensemble method for task-oriented dialogue with subjective knowledge 利用动态少镜头提示和集成方法与主观知识进行任务导向对话

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-28 DOI: 10.1016/j.ipm.2025.104317

Dongning Rao , Jietao Zhuang , Zhihua Jiang

{"title":"Leveraging dynamic few-shot prompting and ensemble method for task-oriented dialogue with subjective knowledge","authors":"Dongning Rao , Jietao Zhuang , Zhihua Jiang","doi":"10.1016/j.ipm.2025.104317","DOIUrl":"10.1016/j.ipm.2025.104317","url":null,"abstract":"<div><div>Subjective knowledge is key to meeting customer needs. Thus, the Subjective Knowledge-grounded Task-oriented Dialogue (SK-TOD) task tries to accommodate subjective user requests like “Does the restaurant have a good atmosphere?” by choosing relevant subjective knowledge snippets and generating appropriate responses. However, unlike existing methods like retrieval-augmented generation using external objective knowledge, selecting subjective knowledge and summarizing opinions from reviews in a specified scope pose new challenges. Therefore, this paper proposes the <strong>DESIGN</strong> (<strong>D</strong>ynamic f<strong>E</strong>w-<strong>S</strong>hot prompt<strong>I</strong>n<strong>G</strong> and e<strong>N</strong>semble) method for SK-TOD. Specifically, DESIGN first adopts Aspect-Based Sentiment Analysis (ABSA) to enhance subjective knowledge snippets and then builds an ensemble composed of diverse base models for knowledge selection (KS). Here, the base models include both classification models and generative models. At last, for response generation (RG), DESIGN employs generative models conditioned on dialogue context and ABSA-enhanced knowledge. Particularly, we devise the sample selection via the similarity-alignment algorithm to choose similar samples dynamically for the few-shot prompting of KS and RG. We experiment on the 11th Dialog System Technology Challenge (DSTC11) SK-TOD benchmark and an extended dataset, ReDial, with 6147 instances. For KS, we beat the winner of DSTC11 and boosted the F1 for 7% regarding the baseline and achieved 86.16%. For RG, DESIGN outperforms baselines and the DSTC11 winner across eight metrics.E.g., DESIGN improves entailment performance by 5% over the DSTC11 winner and 10% over the baseline.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104317"},"PeriodicalIF":6.9,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144907838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Defining the problem: The impact of OCR quality on retrieval-augmented generation performance and strategies for improvement 定义问题：OCR质量对检索增强生成性能的影响和改进策略

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-27 DOI: 10.1016/j.ipm.2025.104368

Minchae Song

{"title":"Defining the problem: The impact of OCR quality on retrieval-augmented generation performance and strategies for improvement","authors":"Minchae Song","doi":"10.1016/j.ipm.2025.104368","DOIUrl":"10.1016/j.ipm.2025.104368","url":null,"abstract":"<div><div>Despite considerable progress in Retrieval-Augmented Generation (RAG) and Optical Character Recognition (OCR) technologies, only a limited amount of research has examined how OCR-derived data influences RAG performance. Thus, this study presents a document-based question-answering dataset derived from unstructured image documents across financial domains and investigates the impact of OCR-generated data on RAG outcomes. Although high OCR accuracy was achieved, especially for handwritten content, using raw OCR outputs directly in the RAG substantially increased the error rates. To address this, we propose a simple yet effective method of transforming OCR outputs into a structured tabular format, with the results showing a marked improvement in RAG performance without altering the OCR quality. The approach proved robust in correcting OCR errors, representing data in structured formats, and integrating alternative retriever and reranker techniques, and highlighted that RAG performance is more sensitive to the structure of input data than to OCR accuracy alone. This study presents a practical solution for optimizing RAG systems by utilizing structured representations of OCR-extracted data, thereby providing new insights for integrating OCR and RAG.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104368"},"PeriodicalIF":6.9,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144903112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalized semi-decentralized federated recommender 个性化的半去中心化联邦推荐器

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-27 DOI: 10.1016/j.ipm.2025.104360

Jiayu Bao, Yicheng Di, Song Shen, Rongsheng Hu, Yuan Liu

{"title":"Personalized semi-decentralized federated recommender","authors":"Jiayu Bao, Yicheng Di, Song Shen, Rongsheng Hu, Yuan Liu","doi":"10.1016/j.ipm.2025.104360","DOIUrl":"10.1016/j.ipm.2025.104360","url":null,"abstract":"<div><div>The recently proposed federated recommender system can alleviate privacy concerns; however, existing methods either rely on third-party servers to access other isolated graphs or restrict local training to isolated graphs. A key challenge in federated learning (FL) is statistical heterogeneity, which can undermine the generalization ability of the global model across clients. To address these issues, we propose a novel semi-decentralized federated recommender framework with adaptive local aggregation, named pFedSG. This framework improves scalability through device-to-device collaboration and enhances local subgraphs by connecting isolated graphs with predicted item-node connections, thereby preserving high-order user-item collaboration information. Furthermore, we introduce a fine-grained personalization (FGP) module, which adaptively aggregates the downloaded global model and local model for each client based on their local objectives, enabling effective learning of fine-grained personalization for users and items. To evaluate the effectiveness of the proposed pFedSG, we conducted extensive experiments on four public datasets. pFedSG significantly outperformed ten benchmark models. Specifically, compared to the best baseline, pFedSG improved HR and NDCG evaluation metrics by 7.37% and 6.51%, respectively. Additionally, pFedSG is applicable to existing graph neural network-based federated recommender methods. Further experiments also validate the superiority of pFedSG from multiple analytical perspectives.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104360"},"PeriodicalIF":6.9,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gaussian Decay Centrality: A quantum-inspired method for identifying important nodes in complex networks 高斯衰减中心性：一种在复杂网络中识别重要节点的量子启发方法

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-27 DOI: 10.1016/j.ipm.2025.104366

Yusong Liu , Haoming Guo , Xuefeng Yan

{"title":"Gaussian Decay Centrality: A quantum-inspired method for identifying important nodes in complex networks","authors":"Yusong Liu , Haoming Guo , Xuefeng Yan","doi":"10.1016/j.ipm.2025.104366","DOIUrl":"10.1016/j.ipm.2025.104366","url":null,"abstract":"<div><div>In complex networks, critical nodes play a pivotal role in facilitating information propagation. Traditional methods for characterizing node importance often suffer from distortions in capturing dynamic attributes. To address this, inspired by the Gaussian wave packet probability density framework, we developed a novel method to evaluate node importance. This method establishes a Gaussian decay mechanism based on wave packet dynamics, which quantitatively models the exponential decay relationship between node importance and the square of topological distance. Additionally, it incorporates a path weight operator derived from the geometric mean of node degrees to capture the conduction enhancement effect between hub nodes. Furthermore, it introduces an initial influence distribution driven by eigenvector centrality to characterize the intrinsic propagation potential of nodes. Experiments were conducted on 8 real-world networks and 45 synthetic networks. Using the true rankings obtained from the SIR model, we calculated the Kendall’s correlation coefficient <span><math><mi>τ</mi></math></span> between the rankings generated by different methods and the true rankings. The proposed method achieved the best results on multiple networks, and the <span><math><mi>τ</mi></math></span> values of it steadily improved as the infection rate in the SIR model increased. Furthermore, experiments confirmed that the seed nodes selected by our method achieved wider propagation coverage in real-world social networks, highlighting its practical value in real-world information dissemination scenarios. In addition, comprehensive analysis using MI and RDF experiments further validated that the proposed method exhibits optimal monotonicity in its ranking results. Comprehensive analysis using MI and RDF experiments confirmed that the proposed method achieves optimal monotonicity in ranking results.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104366"},"PeriodicalIF":6.9,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Class-Missing Semi-supervised document key information extraction via synergistic refinement estimation 基于协同改进估计的半监督文档关键信息提取

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-26 DOI: 10.1016/j.ipm.2025.104335

Pengcheng Guo , Yonghong Song , Boyu Wang , Yankai Cao , Jiayang Ren , Chaojie Ji , Jiahao Liu , Qi Zhang , Qiangqiang Mao

{"title":"Class-Missing Semi-supervised document key information extraction via synergistic refinement estimation","authors":"Pengcheng Guo , Yonghong Song , Boyu Wang , Yankai Cao , Jiayang Ren , Chaojie Ji , Jiahao Liu , Qi Zhang , Qiangqiang Mao","doi":"10.1016/j.ipm.2025.104335","DOIUrl":"10.1016/j.ipm.2025.104335","url":null,"abstract":"<div><div>Current methods for document key information extraction (DKIE) rely heavily on labeled data with high annotation costs. To mitigate this issue, the semi-supervised learning (SSL) paradigm, which utilizes unlabeled document samples, has gained broad attention in DKIE. However, existing SSL methods require labeled and unlabeled data to share an identical label space, which is impractical in many DKIE tasks (i.e., some unlabeled samples do not belong to any known classes in the labeled set). In this paper, we formulate this problem as Class-Missing Semi-supervised (CMSS) DKIE. In DKIE, unknown classes usually belong to minority and fine-grained categories, intensifying the misconnections between known and unknown classes and making CMSS more challenging. To address this issue, we propose Synergistic Refinement Estimation (SRE), a progressive prototype estimation scheme that alleviates the unknown classes bias to the majority known classes on long-tailed unlabeled data. Furthermore, dynamic threshold hash rectification and structural calibration mechanisms are proposed to correct connections between fine-grained classes. Extensive experimental results demonstrate that SRE surpasses existing state-of-the-art methods on several DKIE benchmarks. Code is available at <span><span>https://github.com/anonymoulink/SRE_DKIE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104335"},"PeriodicalIF":6.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144902181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A large language model-based approach for fake review detection: the implicit characteristics perspective 一种基于大型语言模型的虚假评论检测方法：隐式特征透视

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-26 DOI: 10.1016/j.ipm.2025.104352

Zhenhua Wang , Aixin Yao , Guang Xu , Ming Ren

引用次数: 0

Zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning 基于元蒸馏学习的零次和少次中文网络安全事件检测

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-26 DOI: 10.1016/j.ipm.2025.104344

Han Zhang , Bingzhi Xu , Shijie Xiao , Chengfang Zhang , Lixia Ji

{"title":"Zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning","authors":"Han Zhang , Bingzhi Xu , Shijie Xiao , Chengfang Zhang , Lixia Ji","doi":"10.1016/j.ipm.2025.104344","DOIUrl":"10.1016/j.ipm.2025.104344","url":null,"abstract":"<div><div>Traditional cybersecurity event detection has primarily focused on English corpora. However, Chinese corpora pose challenges due to linguistic complexity and the lack of annotated datasets, particularly in recognizing nested compound trigger words and handling zero- and few-shot scenarios. To address these issues, we propose a method, named zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning (CCED). Firstly, we introduce a dynamic dimension transformation mechanism to embed geometric information into span representations for nested compound trigger words extraction in a Chinese corpus. Secondly, we propose meta-distillation learning, which integrates meta-learning with contrastive knowledge distillation to improve model performance. This method boosts accuracy in zero- and few-shot scenarios by facilitating knowledge transfer across tasks. Moreover, to fill the gap in datasets for Chinese cybersecurity event detection, we develop CSED, to the best of our knowledge, the first publicly available annotated dataset in this domain. It includes a large collection of news articles from sources like CNCERT and Twitter, with 17,542 event instances, categorized into 2 event types and 9 sub-types. CCED achieves state-of-the-art F1 scores on CSED, with 57.61%, 76.83%, and 79.14% in zero-shot and few-shot settings, respectively. The dataset and code can be accessed on GitHub: <span><span>https://github.com/vegetable-edu/CCED</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104344"},"PeriodicalIF":6.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144902180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-faceted consistency data augmentation for graph anomaly detection 面向图异常检测的多面一致性数据增强

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-25 DOI: 10.1016/j.ipm.2025.104338

Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao

{"title":"Multi-faceted consistency data augmentation for graph anomaly detection","authors":"Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao","doi":"10.1016/j.ipm.2025.104338","DOIUrl":"10.1016/j.ipm.2025.104338","url":null,"abstract":"<div><div>Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104338"},"PeriodicalIF":6.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144902178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal inference for alleviating confounding bias in multi-criteria rating recommendation 减轻多标准评分推荐混杂偏倚的因果推理

IF 6.9 1区管理学

Information Processing & Management Pub Date : 2025-08-25 DOI: 10.1016/j.ipm.2025.104364

Zhihao Guo , Peng Song , Chenjiao Feng , Kaixuan Yao , Jiye Liang

{"title":"Causal inference for alleviating confounding bias in multi-criteria rating recommendation","authors":"Zhihao Guo , Peng Song , Chenjiao Feng , Kaixuan Yao , Jiye Liang","doi":"10.1016/j.ipm.2025.104364","DOIUrl":"10.1016/j.ipm.2025.104364","url":null,"abstract":"<div><div>Integrating multi-criteria (MC) ratings into recommender systems can enhance the service quality of online platforms. MC ratings depict more fine-grained user preferences from multiple dimensions, such as a hotel system, including ratings for overall, location, cleanliness, etc. The existing MC methods focus on mining the correlation from historical interactions through the data-driven paradigm. However, the traditional methods may capture spurious association in biased observations due to various confounders, which can reduce prediction accuracy. So far, research on how to alleviate confounding bias in MC rating recommendation scenarios remains unexplored. To fill this research gap, we propose a novel <em>Deconfounding Multi-Criteria Recommendation</em> (DMCR) framework, which is used to mitigate the harmful impact triggered by confounders. Specifically, we block the back-door paths that cause bias through the front-door adjustment and estimate the causal effect between user-item pair and overall rating. In the inference phase, the DMCR approximates the outcome after intervention by conditional probabilities on the observational MC data. Moreover, we leverage graph neural network to model underlying higher-order dependencies in MC ratings. This modeling scheme helps to develop the heterogeneity of user MC behavioral preferences. Experimental results on six real datasets demonstrate that the DMCR outperforms the existing baselines.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104364"},"PeriodicalIF":6.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0