{"title":"Learning promotion policies with attention-based deep Q-networks","authors":"Yingnan Xu, Xuchun Wu, Zhenjun Li, Congli Liu, Yansheng Zhang","doi":"10.1007/s10489-025-06914-3","DOIUrl":"10.1007/s10489-025-06914-3","url":null,"abstract":"<div><p>In financial services, personalized promotion strategies are critical for sustaining customer engagement and driving asset growth. We present FAT-DQN, a deep reinforcement learning framework for off-line environments that models sequential decision-making as a Markov Decision Process (MDP), where promotional actions influence future changes in customer assets under management (AUM). FAT-DQN extends the standard Deep Q-Network (DQN) architecture with a multi-head self-attention mechanism over promotion–reward histories augmented by learnable temporal encodings, and applies Feature-wise Linear Modulation (FiLM) to incorporate customer-segment embeddings. To improve robustness, we employ per-customer reward normalization and evaluate policies with both ranking-based metrics and counterfactual off-policy estimators. Empirical results on real promotion logs show that FAT-DQN consistently outperforms baseline methods, yielding a higher mean NDCG@3 (0.7744) compared to Batch-Constrained deep Q-learning (BCQ, 0.7325) and DQN (0.6852). It further improves alignment between predicted and realized outcomes, achieving a Spearman correlation of 0.2584, compared to 0.1619 for BCQ and 0.1522 for DQN. Counterfactual evaluations further show that FAT-DQN delivers consistently strong off-policy estimates, confirming its robustness across evaluation settings. These findings demonstrate that attention-based architectures with modulation offer a more effective and interpretable alternative to standard reinforcement learning approaches for personalized promotion planning in financial services.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal prompt learning with selective feature fusion: towards robust cross-modal alignment","authors":"Jiabao Han, Yahui Wang, Wei Zhong, Ying Zhang, Xichao Yuan","doi":"10.1007/s10489-025-06919-y","DOIUrl":"10.1007/s10489-025-06919-y","url":null,"abstract":"<div><p>Vision–language models (VLMs) have shown impressive transferability but still struggle with robustness and generalization when applied to downstream tasks with limited supervision. To address these challenges, we propose a Selective Feature Fusion (SFF) framework that adaptively suppresses noisy visual regions and reinforces task-relevant cross-modal cues through lightweight, learnable gating. Our approach integrates text-guided visual masking and image-aware textual calibration into a unified pipeline, enabling more discriminative and semantically aligned multimodal representations. Comprehensive evaluations across nine widely used benchmarks demonstrate that our method consistently surpasses strong prompt-learning baselines under both few-shot and base-to-novel generalization settings. In particular, under the 8-shot scenario, our approach achieves the best overall accuracy, maintaining a clear margin over representative methods such as CoCoOp and MaPLe. These results highlight not only the robustness of our design but also its effectiveness in capturing cross-modal semantics under data-limited conditions. Further analyses, including ablation studies and qualitative visualizations, confirm that the proposed gating and calibration modules are complementary and play indispensable roles in improving performance. Taken together, this work provides a simple yet powerful strategy for enhancing the adaptability and generalization of VLMs in real-world scenarios.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Kingsley Arthur, Conghua Zhou, Xiang-Jun Shen, Ronky Wrancis Amber-Doh, Eric Appiah Mantey, Jeremiah Osei-Kwakye
{"title":"DIMCAR: dynamic intent modeling and context-aware recommendations in sparse data environment towards next basket prediction","authors":"John Kingsley Arthur, Conghua Zhou, Xiang-Jun Shen, Ronky Wrancis Amber-Doh, Eric Appiah Mantey, Jeremiah Osei-Kwakye","doi":"10.1007/s10489-025-06796-5","DOIUrl":"10.1007/s10489-025-06796-5","url":null,"abstract":"<div><p>In the fast-changing world of e-commerce, the success of recommender systems is crucial for boosting user engagement and increasing sales. Conventional models often struggle with evolving user preferences and data sparsity, hindering accurate predictions. Existing Graph-based regularization mechanisms and deep learning approaches address these challenges but remain sensitive to noise and computational complexity, limiting their effectiveness in large-scale, real-time settings. We propose a novel multi-layered Next Basket Recommender System called dynamic intent modelling and context-aware recommendation (DIMCAR) model to overcome these limitations. First, we resolve the data sparsity problem by constructing a novel optimized Graph Sparse Regularization framework for Non-negative Matrix Factorization (OGSR-NMF) framework integrating a time-varying graph structure, a novel hybrid sparsity norm, a modified Proximal Alternating Linearized Minimization (mPALM). Additionally, we dynamically model user intents and context using attention mechanisms and Gated Recurrent Units (GRUs). Finally, we integrate a novel Adaptive Reptile Basket Optimization Algorithm into a Deep Convolutional Neural Network, enhancing the model's adaptability to changing user behaviours in real time. Theoretical analysis and experiments on four benchmark datasets demonstrate that DIMCAR outperforms existing models in recommendation accuracy and user satisfaction.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amna Shahid Cheemaa, Muhammad Azhar, Fahim Arif, Qazi Mazhar ul haq, Muhammad Sohail, Asma Iqbal
{"title":"EGPT-SPE: story point effort estimation using improved GPT-2 by removing inefficient attention heads","authors":"Amna Shahid Cheemaa, Muhammad Azhar, Fahim Arif, Qazi Mazhar ul haq, Muhammad Sohail, Asma Iqbal","doi":"10.1007/s10489-025-06824-4","DOIUrl":"10.1007/s10489-025-06824-4","url":null,"abstract":"<div><p>Estimating story points from user requirements is crucial in the Software Development Life Cycle (SDLC) as it impacts resource allocation and timelines; inaccuracies can lead to missed deadlines and increased costs, harming a company’s reputation. While various techniques have emerged to automate this process, conventional machine learning methods often fail to understand the context of user requirements, and deep learning approaches face high computational costs. To address these issues, the Efficient GPT for Story Point Estimation (EGPT-SPE) algorithm optimizes the Multi-Head Attention module by removing inefficient heads, enhancing accuracy and reducing costs. Experiments on the Choetkiertikul dataset (23,313 issues across 16 open-source projects) and the TAWOS dataset (458,232 issues across 39 open-source projects from 12 public JIRA repositories) demonstrated a 5 to 15 percent accuracy improvement in both within-project and cross-project estimations, validating the algorithm’s effectiveness in agile story point estimation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RXNet: cross-modality person re-identification based on a dual-branch network","authors":"Weiyang Zhang, Jiong Guo, Qiang Liu, Maoyang Zou, Honggang Chen, Jing Peng","doi":"10.1007/s10489-025-06501-6","DOIUrl":"10.1007/s10489-025-06501-6","url":null,"abstract":"<div><p>The goal of text-based person re-identification (TI-ReID) is to match individuals using various methods by integrating information from both images and text. TI-ReID encounters significant challenges because of the clear differences in features between images and textual descriptions. Contemporary techniques commonly utilize a method that merges general and specific characteristics to obtain more detailed feature representations. However, these techniques depend on additional models for estimating or segmenting human poses to determine local characteristics, making it challenging to apply them in practice. To solve this problem, we propose a dual-path network based on RegNet and XLNet for TI-ReID (RXNet). In the image segment, RegNet is employed to acquire multitiered semantic image attributes and dynamically assimilate distinct local features through visual focus. In the text segment, XLNet is utilized, to extract significant semantic attributes from the text via a two-way encoding system based on an autoregressive model. Furthermore, to increase the efficacy of our model, we develop both residual triplet attention and dual attention to align features across different modalities. Additionally, we replace cross-entropy ID loss with smoothing ID loss to prevent overfitting while improving the efficiency of the model. Experimental results on the CUHK-PEDES dataset show that the proposed method achieves a rank-1/mAP accuracy of 85.49%/73.40%, outperforming the current state-of-the-art methods by a large margin.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning techniques for point cloud tasks: a review","authors":"Xiaona Song, Haozhe Zhang, Lijun Wang, Jinxing Niu, Ying Zhu, Junjie Nian, Ruixue Cheng","doi":"10.1007/s10489-025-06854-y","DOIUrl":"10.1007/s10489-025-06854-y","url":null,"abstract":"<div><p>As a significant means of representing 3D scenes, point clouds are extensively utilized in various fields Such as computer vision, autonomous driving, robotic interaction, and urban modeling. While deep learning has achieved remarkable Success in the realm of two-dimensional images, and its application to three-dimensional point clouds is also progressively gaining traction. However, the irregular and unstructured nature of point cloud data presents numerous challenges when applying deep learning algorithms to these 3D representations. To foster future research endeavors, this paper concentrates on three fundamental tasks associated with point clouds: classification, object detection, and semantic segmentation. It systematically reviews the current state of development regarding deep learning algorithms pertinent to these tasks. By organizing and analyzing existing literature alongside experimental results derived from publicly available datasets, this paper compares the strengths of different methodologies while also highlighting their limitations. Ultimately, it summarizes the technical challenges encountered in advancing deep learning algorithms for point clouds and outlines potential avenues for progress within this domain.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing act: engagement detection in online learning through master-assistant models with an enhanced hierarchical attention mechanism","authors":"Tingting Han, Ruqian Liu, Shuwei Dou, Wei Wang, Xiaoming Ding, Wenxia Zhang, Jihao Lang, Wenxuan Li, Jixing Han","doi":"10.1007/s10489-025-06893-5","DOIUrl":"10.1007/s10489-025-06893-5","url":null,"abstract":"<div><p>The rapid expansion of online learning calls for the establishment of effective approaches to monitor and boost student engagement, which constitutes a key element influencing learning outcomes. The class imbalances within engagement datasets pose substantial challenges to precise detection and classification. Existing methods for detecting student engagement in online learning adopt weighted loss to address the issue of class imbalance in public datasets. However, due to the challenge of selecting appropriate weights and the risk of overfitting, the effectiveness of this approach often relies on extensive experiments for manual adjustments. To tackle this problem, we propose a Master-Assistant model to address the performance degradation caused by class imbalance to ensure effective detection of student engagement. The Assistant model is designed for coarse-grained classification according to different assistant strategies to assist the Master model for fine-grained classification. Furthermore, we extract multiple engagement-related handcrafted features and assigned different weights via an enhanced hierarchical attention mechanism. Finally, an accuracy of 70.69% and an F1-score of 68% are achieved on the Dataset for Affective States in E-Environments (DAiSEE), setting new state-of-the-art (SOTA) scores. Additionally, experiments on three other imbalanced datasets also validate the robustness of the Master-Assistant model in solving the class imbalance problem.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection method for improving shape perception of small object defects on metal surfaces","authors":"Xingfei Zhu, Christophe Montagne, Qimeng Wang, Lingxiang Hu, Jinghu Yu, Hedi Tabia, Qianqian Hu","doi":"10.1007/s10489-025-06873-9","DOIUrl":"10.1007/s10489-025-06873-9","url":null,"abstract":"<div><p>Defects on metal surfaces often exhibit complexity with diverse shapes, small sizes, and irregular patterns, leading to frequent missed and false detections during inspection and posing significant challenges to automated detection systems. Existing advanced object detectors, when applied directly to small defect detection on metal surfaces, fail to achieve satisfactory results. To mitigate these issues, we proposed a detection method to enhance the shape perception of small object defects on metal surfaces, namely MetalYOLO. Firstly, a novel location-aware attention mechanism is designed to integrate deformable convolutions to form a new feature selection module to enhance the focus on key defect features, optimizes the generation of offsets, and improve the model’s ability to adapt to complex shape objects. Secondly, the standard up-sampling module is replaced with a dynamic sampling module to dynamically adjust the sampling pattern of the input feature distribution to improve computational efficiency and retain complex or small-scale object features, thereby improving detection accuracy. Finally, a new detail-enhanced detection head is designed to further improve the network’s ability to capture fine-grained details by introducing a detail-enhanced attention-sharing module so as to utilize contextual information to selectively suppress irrelevant features, thereby reducing information redundancy. The proposed model is compared with baseline models on the ILS-MB and NEU-DET datasets. and the experimental results show significant improvements in false detection and missed detection rates with only a slight loss in inference speed. Meanwhile, the mAP reached 80.4% and 79.0%, respectively, which is 1.7% and 3.2% higher than the baseline algorithm.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransMambaCC: Integrating Transformer and Pyramid Mamba Network for RGB-T Crowd Counting","authors":"Yangjian Chen, Huailin Zhao, Liangjun Huang, Yubo Yang, Wencan Kang, Jianwei Zhang","doi":"10.1007/s10489-025-06912-5","DOIUrl":"10.1007/s10489-025-06912-5","url":null,"abstract":"<div><p>RGB-T crowd counting is a challenging task that integrates RGB and thermal images to address the limitations of RGB-only approaches in scenes with poor illumination or occlusion. While transformer-based models have shown remarkable success in terms of capturing long-range dependencies, their high computational demands limit their practical applicability. To address this issue, a novel hybrid model named TransMambaCC, which fuses the analytical strength of transformer with the computational efficiency of Mamba, is proposed. This integration not only improves crowd analysis performance, but also significantly reduces computational overhead of the model. Additionally, a Pyramid Mamba module is innovatively designed to address the head-scale variations observed in congested scenes. Extensive experiments conducted on the RGBT-CC dataset demonstrate the superiority of TransMambaCC over the existing approaches in terms of both accuracy and efficiency. Furthermore, the model exhibits strong generalization capabilities, as evidenced by its performance on the ShanghaiTechRGBD dataset. The code is available at https://github.com/yjchen3250/TransMambaCC.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing vitiligo stage diagnosis through a reliable multimodal model with uncertainty calibration","authors":"Zhiming Li, Shuying Jiang, Fan Xiang, Chunying Li, Shuli Li, Tianwen Gao, Kaiqiao He, Jianru Chen, Junpeng Zhang, Junran Zhang","doi":"10.1007/s10489-025-06839-x","DOIUrl":"10.1007/s10489-025-06839-x","url":null,"abstract":"<div><p>Vitiligo is a common dermatological disease featuring hypopigmentation. Accurate staging of vitiligo is crucial for enhancing treatment efficacy. However, traditional diagnostic methods, which rely on physicians' subjective judgments, are time-consuming, labor-intensive, and prone to misdiagnosis. Recently, AI-powered multimodal dermatological classification models have demonstrated significant potential in this area. But the credibility of these models at the decision-making stage is an area that requires further refinement. This study proposes a multimodal disease staging diagnostic model with uncertainty calibration to analyze multimodal samples from three stages of vitiligo. The model innovatively extracts feature information from various modalities and transforms it into a Dirichlet distribution to assess sample uncertainty. Then, the Dempster—Shafer theory is used to fuse evidence from different modalities, generating a final diagnostic result and an uncertainty score. Additionally, an uncertainty—based loss function is designed. And by using an uncertainty threshold method, the model can detect high—uncertainty samples that require additional judgment, effectively reducing the risk of misdiagnosis and missed diagnosis. Experimental results show that this model outperforms existing methods in terms of accuracy, precision, recall, and F1—score. Anomaly detection and noise—resistance experiments verify the model's robustness in handling unknown and noisy data. This model offers a new approach for AI-assisted vitiligo diagnosis, which can assist doctors in making more accurate diagnostic decisions, contribute to improving treatment efficiency.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}