{"title":"Reversible source-aware natural language watermarking via customized lexical substitution","authors":"Ziyu Jiang, Hongxia Wang, Zhenhao Shi, Run Jiao","doi":"10.1016/j.ipm.2024.103977","DOIUrl":"10.1016/j.ipm.2024.103977","url":null,"abstract":"<div><div>Current natural language watermarking (NLW) methods generate suitable watermark words based on local context using pre-trained models (PLMs), minimizing semantic loss in watermarked text. However, these methods still exhibit some limitations. Specifically, there is room for improvement on substitutes quality and watermark imperceptibility since they integrate off-the-shelf lexical substitution (LS) models, which are not specifically tailored for watermarking algorithms. They make strict synchronization constraints to generate identical substitutes list from the original and the watermarked text, and therefore precludes consideration of some high-quality substitutes, which curtails the watermark capacity. Additionally, the local context changes via watermarking embedding, and these methods cannot losslessly recover the original text, limiting the application of NLW to high-precision scenarios such as government documents, military, and medical applications. To address these issues, we propose a reversible source-aware NLW approach, which performs proactive mining to identify potential reversible watermark positions by virtue of a PLM and subsequently embeds the watermark into the text via source-aware LS. Also, we have designed a novel LS algorithm tailored for NLW to enhance the imperceptibility and textual fidelity of watermarked content. Experiments validate the efficiency of our LS method in generating the most suitable substitutes and verifies that our NLW approach achieves complete reversibility while enhancing watermark capacity and textual fidelity compared to prior arts.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103977"},"PeriodicalIF":7.4,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142721386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing emotion recognition in social media: A novel integration of heterogeneous neural networks with fine-tuned language models","authors":"Abbas Maazallahi , Masoud Asadpour , Parisa Bazmi","doi":"10.1016/j.ipm.2024.103974","DOIUrl":"10.1016/j.ipm.2024.103974","url":null,"abstract":"<div><div>Social media platforms have emerged as crucial sources for emotion analysis, but the issue of non-compliance in labeling by fine-tuned large language models (LLMs) can significantly impact the accuracy of emotion classification. This study addresses this challenge by introducing a <strong><em>novel compliance-driven training set</em></strong> that systematically harmonizes label discrepancies across multiple LLMs, thereby enhancing classification accuracy by over 5% on the non-compliance set. Integrating this compliance set with a Heterogeneous Neural Network (HNN) architecture, we propose a robust framework for emotion classification. Our approach is validated on three diverse datasets, GoEmotion, Friends, and TEC, demonstrating substantial improvements in accuracy, F1 score, and recall over baseline models. These results confirm the effectiveness of our compliance-driven strategy and establish a new benchmark for emotion recognition in social media content. The proposed framework offers a versatile and scalable solution applicable across various languages and platforms, ensuring broad utility in advanced emotion classification tasks.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103974"},"PeriodicalIF":7.4,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Overlap to equilibrium: Oversampling imbalanced datasets using overlapping degree","authors":"Sidra Jubair , Jie Yang , Bilal Ali","doi":"10.1016/j.ipm.2024.103975","DOIUrl":"10.1016/j.ipm.2024.103975","url":null,"abstract":"<div><div>Imbalanced and overlapping class distributions present several challenges, including poor generalization, misleading accuracy, and inflated importance of the majority class, which further complicate the classification task. To tackle this, we introduce a new novel oversampling method called GOS that generates samples from positive overlapping samples for imbalanced and overlapping data which improves the classification performance. Firstly, In GOS, a novel concept termed overlapping degree is introduced utilizing both local and global information from positive and negative samples. Secondly, it measures how much a positive sample contributes to the overlapping region and helps to identify positively overlapping samples. Lastly, the identified positive overlapping samples are transformed to generate new positive samples with a transformation matrix derived from the distribution information of all positive samples. We compare GOS with 14 commonly used under-sampling, oversampling, and advanced oversampling methods on 15 publicly available real imbalanced datasets with sample sizes varying from 178 to 2000 having an imbalance ratio varying from 2.02 to 41.4. The experimental results show that GOS outperforms these baselines achieving average improvements of 3.2 % in accuracy, 2.5 % in G-mean, 4.5 % in F1-score, and 5.2 % in AUC.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103975"},"PeriodicalIF":7.4,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fusion of generative adversarial networks and non-negative tensor decomposition for depression fMRI data analysis","authors":"Fengqin Wang , Hengjin Ke , Yunbo Tang","doi":"10.1016/j.ipm.2024.103961","DOIUrl":"10.1016/j.ipm.2024.103961","url":null,"abstract":"<div><h3>Objective:</h3><div>This study introduces a novel approach, F-GAN-NTD, which integrates Generative Adversarial Networks (GANs) with Non-negative Tensor Decomposition (NTD) theory to enhance the analysis of functional Magnetic Resonance Imaging (fMRI) data related to depression.</div></div><div><h3>Methods:</h3><div>F-GAN-NTD is applied to extract nonlinear non-negative factors from multidimensional fMRI tensor data, utilizing Deep-NTD technology to generate factor matrices that capture latent structures and dynamic features. A multi-view neural network architecture processes these factor matrices from all modalities simultaneously, enabling comprehensive pattern discrimination between depression patients and healthy controls. The method is tested on the Closed Eyes Depression fMRI (CEDF) and Strategic Research Program for Brain Sciences (SRPBS) datasets.</div></div><div><h3>Results:</h3><div>The F-GAN-NTD method demonstrates significant improvements in fMRI data classification, outperforming traditional approaches. It also effectively restores incomplete fMRI tensor data and reveals abnormal brain network connections, offering insights into the pathophysiological mechanisms of depression.</div></div><div><h3>Conclusions:</h3><div>F-GAN-NTD enhances the extraction of meaningful features from fMRI data, improving classification performance and providing a deeper understanding of depression-related brain abnormalities. The integration across modalities contributes to a more comprehensive analysis of depression.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103961"},"PeriodicalIF":7.4,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bjorn van Braak, Joerg R. Osterrieder, Marcos R. Machado
{"title":"How can consumers without credit history benefit from the use of information processing and machine learning tools by financial institutions?","authors":"Bjorn van Braak, Joerg R. Osterrieder, Marcos R. Machado","doi":"10.1016/j.ipm.2024.103972","DOIUrl":"10.1016/j.ipm.2024.103972","url":null,"abstract":"<div><div>This research aims to enhance the predictability of creditworthiness among marginalized consumers affected by the widespread adoption of AI frameworks. We utilize ensemble methods to handle the imbalanced dataset used for evaluating the credit risk of consumers with sparse or non-existent credit histories. To promote fairness in the Machine Learning (ML) model, we employed the disparate impact remover—a recognized bias mitigation tool to minimize group bias. Three strategies were employed to tackle dataset imbalance: oversampling, undersampling, and class weight adjustment. Our findings reveal that adjusting the class weight proved most effective in sustaining commendable performance, demonstrating higher accuracy and F-1 scores surpassing 80% in most experiments. While the application of the disparate impact remover might compromise the ML model’s predictive capabilities, our results underscore the necessity of deliberating over the use of potentially bias-sensitive, unprotected features. Recognizing the critical nature of this trade-off for financial decision-makers, we delve into its implications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103972"},"PeriodicalIF":7.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An-An Liu , Long Yang , Wenhui Li , Weizhi Nie , Xianzhu Liu , Haipeng Chen
{"title":"Multi-level semantics probability embedding for image–text matching","authors":"An-An Liu , Long Yang , Wenhui Li , Weizhi Nie , Xianzhu Liu , Haipeng Chen","doi":"10.1016/j.ipm.2024.103968","DOIUrl":"10.1016/j.ipm.2024.103968","url":null,"abstract":"<div><div>The requirement of image–text matching is to retrieve matching images or texts based on textual or visual queries. However, image–text matching is inherently a many-to-many problem, as an image can correspond to multiple levels of visual semantic scenes, which can be described by different texts. Similarly, textual descriptions can be visualized through multiple visual scenes. This leads to ambiguity in the matching between images and texts. To better capture these matching relationships, we employ graph convolutional networks to extract multi-level semantic information for image–text pairs, and construct Gaussian distribution representations for image and text instead of conventional point representations. Furthermore, we introduce a inter-modal mixture of Gaussian distribution to constrain the matching relationships between image–text pairs, which ensures more precise distribution representations in a shared space and strengthens the correlation between cross-modal. We conducted experiments on Flickr30K and MS-COCO, which are two widely used datasets, demonstrates the superior performance of our approach.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103968"},"PeriodicalIF":7.4,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Focus on user micro multi-behavioral states: Time-sensitive User Behavior Conversion Prediction and Multi-view Reinforcement Learning Based Recommendation Approach","authors":"Shanshan Wan , Shuyue Yang , Zebin Fu","doi":"10.1016/j.ipm.2024.103967","DOIUrl":"10.1016/j.ipm.2024.103967","url":null,"abstract":"<div><div>In recommender systems, user behavior conversion implies user interest drifts and behavior patterns. However, current research has paid little attention to the correlation between target behavior conversion rate and user behavior patterns, and the impact of highly time-sensitive multi-behavior analysis on target behavior conversion rate is neglected. Meanwhile, compared to normal behavior conversions, user deviant behavior conversions are seldom studied. The behavior conversion rate that balances normal behavior patterns and deviant behavior patterns can more accurately reflect user interest drifts and real-time needs, thereby improving recommendation performance. Based on the above motivations, we propose a Time-sensitive Behavior Conversion Prediction and Multi-view Reinforcement Learning Based Recommendation Approach (TCMR), aiming to achieve more accurate and adaptive recommendations by analyzing user interest drifts, demand timings and behavior stability. First, we construct a hyper-behavior spatial model of highly collaborative temporal signals, and propose a subnet collaborative method to obtain normal behavior patterns, in which, core subnet, similarity subnet and behavior subnet are extracted from the hyper-behavior spatial model. Subsequently, we design a multi-level user behavior trajectory tree to perceive potential user deviant behaviors by comparing behavior conversions within the single behavior modality and across different behavior modality. By integrating normal behaviors and deviant behaviors, we evaluate user interest drifts, demand timings, and behavior stability, and ultimately obtain a prediction of behavior conversion rate. Finally, a multi-perspective asynchronous reinforcement learning is proposed, enabling TCMR to provide recommendations by considering multiple user perspectives and purposes. Experimental results demonstrate that TCMR exhibits superior recommendation performance and effectiveness.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103967"},"PeriodicalIF":7.4,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TourPIE: Empowering tourists with multi-criteria event-driven personalized travel sequences","authors":"Mariam Orabi, Imad Afyouni, Zaher Al Aghbari","doi":"10.1016/j.ipm.2024.103970","DOIUrl":"10.1016/j.ipm.2024.103970","url":null,"abstract":"<div><div>Tourism stands as a robust global industry, yet modern travelers increasingly crave personalized and immersive experiences in new destinations. While existing research has focused on constructing recommender systems for tourist venues from static sources, a crucial gap remains in addressing transient and upcoming attractions. Motivated by this, we present TourPIE, an innovative approach that bridges this divide by integrating both static and dynamic sources of Points of Interest (POI) lists. Leveraging insights from social media posts, TourPIE identifies tourism-related events and unveils upcoming attractions in real time. This groundbreaking system introduces two novel recommender algorithms, TourPIE-RO and TourPIE-RC, designed to dynamically suggest travel sequences based on contextual criteria such as budget, distance, and interests. In a comparative study across a dataset of 489 venues combining events and POI, TourPIE outperforms baseline methods, achieving a balance between relevant attractions and cost-effective routes while minimizing travel distance. Results show improved interest profit while reducing traveling distance by at least 10 km, and at least a <span><math><mrow><mo>×</mo><mn>2</mn></mrow></math></span> improvement in distance overhead compared to balanced baselines. Additionally, TourPIE nearly aligns with routes of single-criteria greedy baselines. These findings underscore TourPIE’s effectiveness in recommending tailored travel plans for modern explorers seeking diverse and unforgettable experiences.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103970"},"PeriodicalIF":7.4,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengwei Pan , Jingpei Lei , Jiaan Wang , Dantong Ouyang , Jianfeng Qu , Zhixu Li
{"title":"Concept-aware embedding for logical query reasoning over knowledge graphs","authors":"Pengwei Pan , Jingpei Lei , Jiaan Wang , Dantong Ouyang , Jianfeng Qu , Zhixu Li","doi":"10.1016/j.ipm.2024.103971","DOIUrl":"10.1016/j.ipm.2024.103971","url":null,"abstract":"<div><div>Logical query reasoning over knowledge graphs (KGs) is an important task for querying some information upon specified conditions. Despite recent advancements, existing methods typically focus on the inherent structure of logical queries and fail to capture the commonality among entities and relations, resulting in cascading errors during multi-hop inference. To mitigate this issue, we resort to inferring relations’ domain constraints based on the commonality of their connected entities implicitly. Specifically, to capture the domain constraints of relations, we treat the set of relations emitted by an entity as its implicit concept information and derive a relation’s domain constraint by aggregating the implicit concept information of its head entities. Employing a geometric-based embedding strategy, we enrich the representations of entities in the query with their implicit concept information. Additionally, we design a straightforward yet effective curriculum learning strategy to refine its reasoning skills. Notably, our model can be integrated into any existing query embedding-based logical query reasoning methods in a plug-and-play manner, enhancing their understanding of the entities as well as relations in queries. Experiments on three widely used datasets show that our model can achieve comparable outcomes and improve the performance of existing logical query reasoning models. Particularly, as a plug-in, it achieves an absolute improvement of the maximum 8.4% Hits@3 compared to the original model on the FB15k dataset, and it surpasses the former state-of-the-art plug-and-play logical query reasoning model in most scenes, exceeding it by up to 2.1% average Hits@3 results.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103971"},"PeriodicalIF":7.4,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Song , Yingji Li , Mingjie Tian , Hanwen Wang , Fausto Giunchiglia , Hao Xu
{"title":"Causal keyword driven reliable text classification with large language model feedback","authors":"Rui Song , Yingji Li , Mingjie Tian , Hanwen Wang , Fausto Giunchiglia , Hao Xu","doi":"10.1016/j.ipm.2024.103964","DOIUrl":"10.1016/j.ipm.2024.103964","url":null,"abstract":"<div><div>Recent studies show Pre-trained Language Models (PLMs) tend to shortcut learning, reducing effectiveness with Out-Of-Distribution (OOD) samples, prompting research on the impact of shortcuts and robust causal features by interpretable methods for text classification. However, current approaches encounter two primary challenges. Firstly, black-box interpretable methods often yield incorrect causal keywords. Secondly, existing methods do not differentiate between shortcuts and causal keywords, often employing a unified approach to deal with them. To address the first challenge, we propose a framework that incorporates Large Language Model’s feedback into the process of identifying shortcuts and causal keywords. Specifically, we transform causal feature extraction into a word-level binary labeling task with the aid of ChatGPT. For the second challenge, we introduce a multi-grained shortcut mitigation framework. This framework includes two auxiliary tasks aimed at addressing shortcuts and causal features separately: shortcut reconstruction and counterfactual contrastive learning. These tasks enhance PLMs at both the token and sample granularity levels, respectively. Experimental results show that the proposed method achieves an average performance improvement of more than 1% under the premise of four different language model as the backbones for sentiment classification and toxicity detection tasks on 8 datasets compared with the most recent baseline methods.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 2","pages":"Article 103964"},"PeriodicalIF":7.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}