{"title":"Attention-guided residual shrinkage with gated recurrent unit for human activity recognition","authors":"S. Banushri , R. Jagadeesha","doi":"10.1016/j.ipm.2025.104439","DOIUrl":"10.1016/j.ipm.2025.104439","url":null,"abstract":"<div><div>Human Action Recognition (HAR) is crucial for applications like video surveillance. Although numerous algorithms have been developed for HAR, these algorithms have failed to appropriately extract spatial and temporal features. In this manuscript, a HAR architecture is developed that integrates Multiple Residual Shrinkage Building Units (MRSBU) for spatial feature extraction with Gated Recurrent Unit (GRU) for temporal modeling. The architecture employs Inception v3 to capture rich spatial features from video frames, and the MRSBU learns complex temporal features using an adaptive soft-thresholding mechanism to suppress noisy and redundant features. This is then fed into a GRU with a temporal attention mechanism that applies dynamic significance to each frame and classifies the human activities. This lightweight and effective model demonstrates superior generalization across multiple challenging HAR dataset. The proposed MRSBU with GRU algorithm achieved an accuracy of 99.75 % on the UCF50 dataset, 99.55 % on the UCF101 dataset, and 98.95 % on the HMDB51 dataset, outperforming conventional Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM), Vision Transformer (ViT), and 3D-CNN models. These results show the proposed model’s effectiveness and robustness across different video scenarios, including real-world surveillance and anomaly detection applications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104439"},"PeriodicalIF":6.9,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shibing Xiang , Bing Liu , Xin Jiang , Zhengan Huang , Yifang Ma
{"title":"Knowledge precedence networks: Mining progression patterns of scientific discoveries beyond prerequisites","authors":"Shibing Xiang , Bing Liu , Xin Jiang , Zhengan Huang , Yifang Ma","doi":"10.1016/j.ipm.2025.104424","DOIUrl":"10.1016/j.ipm.2025.104424","url":null,"abstract":"<div><div>Understanding how knowledge evolves through scientists’ career paths is essential for advancing education and innovation. This study constructs Knowledge Precedence Networks (KPNs) to uncover scientific progression patterns in real-world practice across 19 disciplines, analyzing the research trajectories of 4,969,403 scientists and 80 million publications from the OpenAlex dataset. We propose the <strong>CoCiTCD</strong> method, which integrates <strong>Co</strong>-<strong>Ci</strong>ting networks with <strong>T</strong>emporal <strong>C</strong>ommunity <strong>D</strong>etection to capture knowledge progression structures by identifying research communities, selecting representative concepts, and deriving temporal concept pairs. KPNs across Mathematics, Computer Science, and Engineering emphasize the critical role of foundational concepts in supporting advanced topics. For example, Algorithms bridge Mathematics and Computer Science, driving advancements in Artificial Intelligence and Data Science. We evaluate the alignment between KPNs for 303 concepts and theoretical prerequisite relations annotated by large language models, revealing how scientists engage with knowledge over time. The KPN attains a recall of 25.77% in best case, complemented by the citation-based KCN reaching 26.6%. This consistently low alignment indicates that empirical real-world topic transitions frequently diverge from theoretical prerequisite orderings. Furthermore, an AUC of 0.76 on our sample variational ROC curve underscores the robustness of our KPN approach in capturing the nuanced, innovative nature of knowledge progression. The KPNs provide valuable insights for research planning, learning path design, interdisciplinary collaboration, and understanding the hierarchical knowledge structure, thereby contributing to the Science of Science by uncovering real patterns of knowledge progression across disciplines.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104424"},"PeriodicalIF":6.9,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LRSA: LLM-RecSys alignment for time-specific next POI recommendation","authors":"Jinhui Zhu, Xiangfeng Luo, Xin Yao, Xiao Wei","doi":"10.1016/j.ipm.2025.104434","DOIUrl":"10.1016/j.ipm.2025.104434","url":null,"abstract":"<div><div>Time-specific next point-of-interest (POI) recommendation aims to predict which POI a user will visit at a given time, a task challenged by the limited textual information in ID-based historical data. Although large language models (LLMs) demonstrate strong commonsense reasoning, their performance in POI recommendation remains suboptimal due to the semantic gap between textual inputs and ID-based user preferences. To address this, we propose a novel LLM-RecSys Alignment (LRSA) framework. First, the historical fact collector is designed to identify the influential trajectories efficiently. Second, the rotational alignment is proposed to align the semantics of the LLMs with the ID-based models. Finally, we design the fuse prompt to combine the user preference into the plain text prompt. Moreover, rather than directly input the fuse prompt to the LLMs, we propose the hierarchical prompt tuning to facilitate a two-stage learning process, starting from low-level plain text prompt to high-level multimodal fuse prompt for the LLMs. Experiments on three benchmark datasets (NYC, TKY, and Gowalla) demonstrate that our method achieves average improvements of 24.39% and 9.93% in Acc@1, 25.26% and 12.79% in NDCG@5, and 10.46% and 15.41% in MRR over the latest ID-based models and LLM-based models, respectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104434"},"PeriodicalIF":6.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can large language models replace human experts in knowledge construction? A comparative analysis from the perspectives of information quality, information perception, and information load","authors":"Jingzhu Wei, Zhipeng Chen","doi":"10.1016/j.ipm.2025.104437","DOIUrl":"10.1016/j.ipm.2025.104437","url":null,"abstract":"<div><div>This study evaluates how large language models support knowledge construction by comparing Dense and Mixture of Experts architectures with human expert texts across six dimensions: Intrinsic Information Quality, Contextual Information Quality, Representational Information Quality, Linguistic Affinity, Structural Clarity, and Information Load. Using 2028 questions and 6084 responses, we compute composite indicators and estimate hierarchical regressions. MoE attains the highest Representational Information Quality in 89.69 % overall, rising to 95.72 % in closed domains, and the highest Linguistic Affinity in 65.30 %, but incurs high Information Load in 81.71 %. Dense leads Structural Clarity in 72.87 % and yields stable yet conservative expression. Human experts maintain low Information Load in 77.42 %. Regressions show that complexity increases representational expressiveness and load but weakens contextual alignment, while specificity increases affinity, structure, and load. Both model types lag behind the human standard on Intrinsic Information Quality and Contextual Information Quality. Findings support task aligned model selection and hybrid workflows.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104437"},"PeriodicalIF":6.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel automated policy text evaluation framework integrating PMC into large language models","authors":"Xiaobin Lu, Zinan Yang, Chaoguang Huo","doi":"10.1016/j.ipm.2025.104440","DOIUrl":"10.1016/j.ipm.2025.104440","url":null,"abstract":"<div><div>Automated policy text evaluation is a critical research topic in policy informatics. Previous methods rely predominantly on manual variable assignment, making them inadequate for large-scale policy evaluation, while their context-dependent indicators extracted from specific policies lack cross-domain applicability. To address this, we propose a novel policy text automated evaluation framework by redesigning three generalized first-level evaluation indicators that are applicable to policies in any domain, integrating Policy Modeling Consistency Model (PMC) into large language models (LLM), and constructing automated PMC-scoring models based on LLaMA-3-Chinese-8B and Qwen-2.5-7B respectively. Using Chinese S&T policies as examples, we construct the first Chinese policy evaluation dataset with 22,630 labeled policy samples and train four PMC indicator automated calculation models. Compared to the baselines, the model based on Qwen-2.5-7B achieves the best performance in the evaluation of policy character, with an F1-score of 80.41%. The model based on LLaMA-3-Chinese-8B achieves best performance in the evaluation of policy normativity and policy function, with F1-scores of 75.07% and 74.11% respectively. This enables the automated calculation of PMC indices and the generation of a multi-input-output table for comprehensive policy analysis. The application in biosafety policies and data governance policies validate the cross-domain applicability of the framework. As the first framework for automated PMC evaluation, our methodology provides an innovative approach for large-scale, cross-domain policy evaluation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104440"},"PeriodicalIF":6.9,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What drives attention sinks? A study of massive activations and rotational positional encoding in large vision–language models","authors":"Xiaofeng Zhang , Yuanchao Zhu , Chaochen Gu , Jiawei Cao , Hao Cheng , Kaijie Wu","doi":"10.1016/j.ipm.2025.104431","DOIUrl":"10.1016/j.ipm.2025.104431","url":null,"abstract":"<div><div>We identify that visual attention sinks — image tokens receiving disproportionately high attention despite semantic irrelevance — are caused by a dual challenge in large vision–language models (LVLMs): (1) the structural bias of Rotary Position Embedding (ROPE), which applies one-dimensional long-term decay to linearized image tokens and creates an “image alignment bias”; and (2) the resulting massive activation in specific hidden dimensions (e.g., 1415, 2533), which skews the attention distribution via softmax saturation. To address this systematically, we propose a two-stage framework: Manhattan Causal Attention (MCA) and Vision Attention Allocation (VAA). MCA first mitigates the structural bias by replacing 1D position indices with 2D spatial coordinates and computing relative positions using Manhattan distance, thereby preserving the image’s spatial locality. Building upon MCA, VAA then acts as a plug-and-play refinement, reallocating attention from residual visual anchor tokens to semantically meaningful regions within Image-Focused heads. Extensive experiments show that our method significantly improves performance across multiple LVLMs and benchmarks. On object hallucination tasks (POPE, CHAIR), VAA achieves over 10% improvement in accuracy and consistency. It also brings consistent gains (up to 12%) on general reasoning tasks (GQA, ScienceQA). This work not only reveals new insights into the behavior of attention mechanisms in Large Vision Language Models(LVLMs), but also provides an effective and generalizable solution to improve visual grounding and reduce hallucinations.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104431"},"PeriodicalIF":6.9,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A density-driven graph-based clustering model with adaptive outlier recognition","authors":"Jiayi Tang , Meng Zhang , Ruiyan Ma , Jingjing Xiao","doi":"10.1016/j.ipm.2025.104436","DOIUrl":"10.1016/j.ipm.2025.104436","url":null,"abstract":"<div><div>Clustering performance often deteriorates in the presence of outliers, especially when existing algorithms exhibit high parameter sensitivity and computational inefficiency. To address these limitations, we propose a <em>density-driven graph-based clustering algorithm with adaptive outlier recognition</em>. The method integrates local density estimation into a graph construction framework to dynamically identify and suppress outliers, ensuring that the learned affinity graph is based on reliable samples only. Furthermore, we introduce a non-negative matrix factorization (NMF)-based eigenvector approximation strategy, which reformulates the Laplacian rank constraint as a regularization term, thereby reducing both parameter dependence and computational burden. An iterative optimization scheme with theoretical convergence guarantees is developed to solve the resulting problem efficiently. Extensive experiments on two synthetic datasets and eleven real-world datasets—including Yale (165 samples), COIL100 (7200), ALOI (10800), CIFAR100 (50000) and MNIST (60000)—demonstrate that our method consistently ranks among the top three performers. Notably, it outperforms competitive baselines by approximately 5% on several datasets, and when 10% outliers are injected, it maintains leading performance with over 5% improvement over state-of-the-art methods. These results confirm the proposed algorithm’s robustness, scalability, and accuracy in both clean and noisy scenarios. The implementation of the proposed method is publicly available at: <span><span>https://github.com/PhdJiayiTang/RCAL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104436"},"PeriodicalIF":6.9,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An overview of opinion polarization: models, drivers, and strategic solutions","authors":"Shenghua Liu , Zhibin Wu , Luis Martínez","doi":"10.1016/j.ipm.2025.104433","DOIUrl":"10.1016/j.ipm.2025.104433","url":null,"abstract":"<div><div>In the context of social media and algorithmic personalization shaping information flows, opinion polarization poses a significant challenge to information processing and decision making. In this paper, we retrieved Web of Science records and applied a multi-stage screening process, yielding 145 rigorously selected papers on opinion polarization. From these sources, we developed a three-dimensional classification framework categorizing polarization models into individual behavior, group dynamics, and network structure. Our analysis reveals that cognitive bias in information processing, identity homophily within social groups, and algorithmic filtering in online platforms serve as core drivers of digital opinion polarization. Based on these insights, we propose strategic solutions across four interdependent domains: information level interventions, dialogue level facilitation, technology level adjustments, and psychology level guidance. We illustrate how these measures can be operationalized to mitigate echo chamber effects and foster cross-cutting engagement. This review synthesizes existing research to offer a structured foundation for understanding complex polarization mechanisms. It also provides a theoretical basis for future cross-platform dynamic modeling and policy development. Finally, we identify critical research gaps and outline future directions, highlighting the need for adaptive AI-driven moderation strategies, dynamic polarization models, and coordinated cross-platform policy interventions in the evolving digital landscape.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104433"},"PeriodicalIF":6.9,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fang Liu , Yili Li , Jian Lang , Rongpei Hong , Fan Zhou
{"title":"Enhancing fake news video detection with self-driven question–answer from LMMs","authors":"Fang Liu , Yili Li , Jian Lang , Rongpei Hong , Fan Zhou","doi":"10.1016/j.ipm.2025.104432","DOIUrl":"10.1016/j.ipm.2025.104432","url":null,"abstract":"<div><div>The widespread dissemination of fake news on online video sharing platforms endangers the politics and public health. Existing approaches to Fake News Video Detection (FNVD) primarily focus on modeling the multimodal content within the video itself, yet they often struggle to effectively handle videos with complex semantics or sophisticated manipulations. To solve these challenges, we propose <strong>SAFE</strong>, a novel <strong>S</strong>elf-driven question <strong>A</strong>nswering <strong>F</strong>ramework with Large Multimodal Models (LMMs) for <strong>E</strong>nhanced fake news video detection. The core innovation of our method is to excavate rich parametric knowledge from pre-trained LMMs as a crucial semantic complement to the video’s original multimodal content for enhanced detection. Specifically, we introduce a new <em>Self-Driven Divergent Reasoning</em> paradigm, where two LMMs autonomously engage in a multi-round dialog, yielding diverse question–answer (QA) pairs relevant to the authenticity of the video content. To effectively leverage this external knowledge, we develop a <em>Cross-Modal Relevance Guided Ensemble module</em> that selectively integrates informative QA cues while filtering out irrelevant or hallucinated noise. Furthermore, we design a <em>QA-Aware Cross-Modal Fusion Network</em> that performs fine-grained semantic alignment between the distilled QA knowledge and modality-specific video features through a tailored cross-attention mechanism, achieving consistent performance gain. Extensive experiments on three video benchmarks demonstrate that SAFE consistently surpasses state-of-the-art baselines, achieving an average improvement of 2.6% across four metrics in detection evaluation and an impressive 27.4% gain in cross-platform generalization evaluation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104432"},"PeriodicalIF":6.9,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingfeng Chen , Jianfeng Deng , Debo Cheng , Jiuyong Li , Lin Liu
{"title":"Multi-view debiasing representation learning for recommender systems","authors":"Qingfeng Chen , Jianfeng Deng , Debo Cheng , Jiuyong Li , Lin Liu","doi":"10.1016/j.ipm.2025.104429","DOIUrl":"10.1016/j.ipm.2025.104429","url":null,"abstract":"<div><div>Recommender systems aim to predict user feedback on unseen items, but confounding bias, particularly from latent confounders, presents a major challenge. Existing debiasing methods in recommender systems often overlook the complex interplay among multiple features and the subtleties of user preferences. To address this, we propose a novel framework called Multi-View-based Identifiable Debiased Learning (MViDL) for recommendations, even in the presence of latent confounders. Specifically, MViDL first employs a multi-view framework to discern interactions between user and item features, unearth user interests in specific items, and capture fundamental user and item ID information. To mitigate the effects of latent confounders, MViDL incorporates the identifiable Variational Auto-Encoder (iVAE) to efficiently infer the latent representation from a set of proxy variables and adjusts for the learned latent representation to mitigate confounding bias. We further provide a theoretical analysis of the identifiability of the latent representations. Extensive evaluations on three real-world datasets highlight the superiority of MViDL. Specifically, our approach achieves average improvements of approximately 6.03% and 5.70% in NDCG@K and Recall@K over the state-of-the-art (SOTA) baselines on Coat, 3.54% and 2.29% on Yahoo!R3, and 1.49% and 2.47% on KuaiRand.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104429"},"PeriodicalIF":6.9,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}