Chun Shan , Chuanle Song , Tongyi Zou , Jiayi Li , Shaoming Liu
{"title":"Dual dynamic transformer for image captioning","authors":"Chun Shan , Chuanle Song , Tongyi Zou , Jiayi Li , Shaoming Liu","doi":"10.1016/j.eswa.2025.128597","DOIUrl":"10.1016/j.eswa.2025.128597","url":null,"abstract":"<div><div>The task of image captioning, widely acclaimed in the field of computer vision, aims to depict the content of an image, wielding a significant impact on people’s lives. Present methodologies for this task typically involve extracting global and local features to capture both overall and intricate details within images. However, the former, reliant on high-level, low-resolution grid features, when directly inputted into transformer encoders, may falter in establishing robust correlations between individual grids, thereby leading to the loss of relationship information between grid features. Additionally, the latter, utilizing region features derived from object detectors, may hinder transformers from comprehending the semantic relationships among regions, resulting in semantic information loss. To tackle these challenges, we introduce a novel Dual Dynamic Transformer (D<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>T) framework for image captioning, amalgamating the benefits of dynamic grid features and dynamic region features. Specifically, the Dynamic Pseudo-regions Grid (DPG) encoder enhances the strong correlation between grid features by grouping the attention of different grids and dynamically generating pseudo-regions, facilitating superior fusion with region features. Furthermore, the Dynamic Multi-Level Relation Region (DMR<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>) encoder augments the comprehension of semantic relationships among various region features through attention-based multi-level relations. In the encoding phase, to seamlessly integrate dynamic grid features and dynamic region features, we propose a feature fusion module for combining these two distinct feature types. Moreover, additional experiments conducted on the MSCOCO dataset demonstrate that our model achieves state-of-the-art performance without incurring additional parameter overhead.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128597"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatiotemporal dual-branch feature-guided fusion network for driver attention prediction","authors":"Yuekui Zhang , Yunzuo Zhang , Yaoge Xiao , Tong Wang","doi":"10.1016/j.eswa.2025.128564","DOIUrl":"10.1016/j.eswa.2025.128564","url":null,"abstract":"<div><div>Predicting the driver’s gaze area is crucial for safe driving in rapidly changing traffic scenarios. However, existing driver attention prediction models generally suffer from two key limitations: insufficient utilization of spatial scale features, which hinders the precise capture of critical information in the scene; the lack of effective guidance from motion information between video frames, making it difficult to assess dynamic changes in the surrounding environment accurately. To address these issues, we propose a Spatiotemporal Dual-branch Feature-guided Fusion Network (SDFF-Net). Specifically, in the spatial branch, we design a Multi-scale Feature Aggregation (MFA) module to enhance the representation of detailed features by constructing bidirectional sampling and layer-by-layer correlation paths, enabling comprehensive extraction of saliency cues across receptive fields. In the temporal branch, we introduce an Attention Transfer Mechanism (ATM) to guide temporal modeling across consecutive frames, improving the ability to capture long-distance dependencies. Finally, we fuse the spatiotemporal features and decode them to generate the predicted saliency map. Experimental results on the DADA-2000 and TDV datasets show that the proposed SDFF-Net achieves state-of-the-art performance in driver attention prediction, outperforming existing methods in multiple evaluation metrics. Benefiting from its efficient dual-branch architecture, SDFF-Net is well-suited for deployment in resource-constrained environments, providing reliable real-time attention prediction, which is of great significance for enhancing driving safety and supporting advanced driver assistance systems in complex traffic scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128564"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Novel approach for deep learning-based market forecasting and portfolio selection incorporating market efficiency","authors":"Poongjin Cho , Kyungwon Kim","doi":"10.1016/j.eswa.2025.128610","DOIUrl":"10.1016/j.eswa.2025.128610","url":null,"abstract":"<div><div>Efficient portfolio construction remains a fundamental challenge for investors, especially in market environments that are constantly changing and uncertain. Although existing portfolio optimization models such as the Black-Litterman framework incorporate predictive views, they generally do not account for the varying levels of market efficiency, which can influence the reliability of those views. To address this limitation, we design a new portfolio construction method that explicitly incorporates market efficiency. We propose a novel framework that adjusts the uncertainty of predictive views according to market efficiency levels. Using this framework, we reconstruct the Black-Litterman portfolio and confirm its potential to enhance returns. Utilizing actual data from the past decade, deep learning algorithms have performed better in volatile or inefficient markets. Additionally, by reflecting prediction uncertainty through market efficiency derived from stationary return series, we develop a portfolio that significantly outperforms the benchmarks, including the traditional Markowitz portfolio and the standard Black-Litterman model without market efficiency adjustments. Our approach minimizes losses and maximizes returns across various market conditions. Consequently, this strategy is suitable for pension funds and institutional investors seeking long-term growth and risk management.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128610"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144330878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kejia Zhang , Yingxin Qin , Haiwei Pan , Baoying Ma
{"title":"Diffusion-based adversarial attack method against person re-identification","authors":"Kejia Zhang , Yingxin Qin , Haiwei Pan , Baoying Ma","doi":"10.1016/j.eswa.2025.128541","DOIUrl":"10.1016/j.eswa.2025.128541","url":null,"abstract":"<div><div>Person re-identification is a computer vision task aimed at matching pedestrian images of the same identity captured by non-overlapping cameras from different viewpoints. Although deep learning effectively addresses challenges like viewpoint variations and occlusions, these systems remain vulnerable to adversarial attacks. These attacks employ carefully crafted perturbations that are invisible to humans but can manipulate model predictions, raising substantial security concerns. In this work,we propose Diff-AA-PR, a novel diffusion-based adversarial attack framework tailored for person re-identification. By integrating diffusion models with discrete wavelet transform, Diff-AA-PR generates adversarial examples that are both visually inconspicuous and highly effective. Specifically, adversarial conditions are incorporated into the inverse diffusion sampling process to steer the generation of examples toward the decision boundary of the target query distribution. Additionally, discrete wavelet transform decomposes images into multi-scale frequency components, allowing perturbations to be constrained within low-frequency domains to enhance imperceptibility and attack efficiency. Extensive experiments on state-of-the-art ReID models across the Market-1501 and Cuhk03 datasets validate the superiority of the proposed approach in terms of both attack success rate and stealth.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128541"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144296933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoxue Ma , Yanzhong He , Jacky Keung , Cheng Tan , Chuanxiang Ma , Wenhua Hu , Fuyang Li
{"title":"On the value of imbalance loss functions in enhancing deep learning-based vulnerability detection","authors":"Xiaoxue Ma , Yanzhong He , Jacky Keung , Cheng Tan , Chuanxiang Ma , Wenhua Hu , Fuyang Li","doi":"10.1016/j.eswa.2025.128504","DOIUrl":"10.1016/j.eswa.2025.128504","url":null,"abstract":"<div><div>Software vulnerability detection is crucial in software engineering and information security, and deep learning has been demonstrated to be effective in this domain. However, the class imbalance issue, where non-vulnerable code snippets vastly outnumber vulnerable ones, hinders the performance of deep learning-based vulnerability detection (DLVD) models. Although some recent research has explored the use of imbalance loss functions to address this issue and enhance model efficacy, they have primarily focused on a limited selection of imbalance loss functions, leaving many others unexplored. Therefore, their conclusions about the most effective imbalance loss function may be biased and inconclusive. To fill this gap, we first conduct a comprehensive literature review of 119 DLVD studies, focusing on the loss functions used by these models. We then assess the effectiveness of nine imbalance loss functions alongside cross entropy (CE) loss (the standard balanced loss function) on two DLVD models across four public vulnerability datasets. Our evaluation incorporates six performance metrics, with results analyzed using the Scott-Knott effect size difference (ESD) test. Furthermore, we employ interpretable analysis to elucidate the impact of loss functions on model performance. Our findings provide key insights for DLVD, which mainly include the following: the LineVul model consistently outperforms the ReVeal model; label distribution aware margin (LDAM) loss achieves the highest Precision, while logit adjustment (LA) loss yields the best Recall; Class balanced focal (CB-Focal) loss excels in comprehensive performance on extremely imbalanced datasets; and LA loss is optimal for nearly balanced datasets. We recommend using LineVul with either CB-Focal loss or LA loss to enhance DLVD outcomes. Our source code and datasets are available at <span><span>https://github.com/YanzhongHe/DLVD-ImbalanceLossEmpirical</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128504"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144296784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenjuan Zuo, Zhiqiang Wei, Xiaodong Wang, Jie Nie, Lei Huang
{"title":"Omni-frequency diffusion-based regional feature consistency recovery for realistic image super-resolution","authors":"Chenjuan Zuo, Zhiqiang Wei, Xiaodong Wang, Jie Nie, Lei Huang","doi":"10.1016/j.eswa.2025.128575","DOIUrl":"10.1016/j.eswa.2025.128575","url":null,"abstract":"<div><div>Blind super-resolution (BSR) endeavors to recover a high-quality image from its degraded low-resolution counterpart, which suffers from multiple complex and unknown degradations. Existing diffusion-based methods strive to indiscriminately enhance all intricate aspects, disregarding the consistent interconnectedness between specific details and their surrounding contextual characteristics. Consequently, these approaches typically induces the production of excessively intensified and artificial details. In this paper, we propose a novel Regional feature Consistency Recovery framework based on diffusion model, DiffRCR, which exhibits the capability to hierarchically restore the frequency domain information vanishing in different regions, and generate photorealistic details consistent with the regional characteristics. Specifically, we devise a hierarchical recovery method to explore frequency differences and recover different frequency domains at various levels, enabling the hierarchical recovery of distinct degradation regions. Furthermore, DiffRCR harnesses contextual information surrounding intricate details and regenerates these details ensuring a harmonious consistency with the characteristics of the degraded area. Empirical evidence from comprehensive experiments has substantiated that DiffRCR yields excellent performance in relation to both the fidelity of construction and the perceptual quality when employed for authentic super-resolution tasks. The code is available at <span><span>https://github.com/huanglab-research/DiffRCR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128575"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Zhang , Yuchuan Zheng , Ziqi Zhang , Bowen Liu , Zixing Guo , Juan Wei
{"title":"Power grid fault diagnosis based on an improved BERT model","authors":"Xu Zhang , Yuchuan Zheng , Ziqi Zhang , Bowen Liu , Zixing Guo , Juan Wei","doi":"10.1016/j.eswa.2025.128648","DOIUrl":"10.1016/j.eswa.2025.128648","url":null,"abstract":"<div><div>With the increasing complexity of power grid structures, highly automated power grid fault diagnosis has become an important trend in intelligent power grid dispatching operations. To achieve end-to-end power grid fault diagnosis, a power grid fault diagnosis method based on an improved bidirectional encoder representation from transformers (BERT) model is proposed in this paper. This method addresses the alarm information text received by an energy management system (EMS) when the power grid fails, with the goal of realizing intelligent power grid fault diagnosis without logical interventions. First, a comprehensive feature vector is designed to serve as the input of the proposed model. This feature vector consists of a word embedding vector, a sentence embedding vector and a position embedding vector. Inspired by automatic summary generation technology, the output of the model is designed as a paragraph of descriptive text summarizing the core features of the target fault, including the name of the faulty equipment and the name of the nonoperating circuit breaker. Second, a convolutional time function is constructed to account for the time series distribution characteristics of the alarm information, and an improved BERT model with a multihead self-attention mechanism involving this convolutional time function is proposed to achieve end-to-end power grid fault diagnosis. The process extends from fault alarm information to descriptive core fault feature summary text. Finally, a simulated system and real grid cases corroborate the notion that the proposed power grid fault diagnosis method based on improved BERT significantly outperforms the traditional method in terms of diagnostic speed and accuracy.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128648"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144314217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gai Li , Yuwen Zhang , Xuegang Song , Peng Yang , Lei Dong , Yaohui Huang , Xiaohua Xiao , Tianfu Wang , Shuqiang Wang , Baiying Lei
{"title":"Locally similar multi-hop fusion GNNs with data augmentation for early Alzheimer’s detection","authors":"Gai Li , Yuwen Zhang , Xuegang Song , Peng Yang , Lei Dong , Yaohui Huang , Xiaohua Xiao , Tianfu Wang , Shuqiang Wang , Baiying Lei","doi":"10.1016/j.eswa.2025.128333","DOIUrl":"10.1016/j.eswa.2025.128333","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is an irreversible brain disease that has an enormous impact on individuals and society. However, existing AD diagnostic models based on the spatiotemporal correlation of resting-state functional magnetic resonance imaging (rs-fMRI) are unable to focus on temporal correlation information between long-distance time points. In addition, graph neural networks (GNNs) based on imaging information and phenotypic information suffer from excessive smoothing or information loss. To address these issues, we propose a local similarity multi-hop fusion graph neural network (LSMHF-GNN) for the early classification of AD. The main work includes three aspects: 1) the dynamic brain functional connectivity network (dBFC) is constructed using the sliding window method with data enhancement to address the problem of imperfect use of information regarding the long-term brain function damage caused by AD. 2) the LSMHF-GNN is constructed by combining neuroimaging and non-imaging information to alleviate the problem of imperfect use of information and the problem of excessive smoothing or message passing failure that is prone to occur with heterogeneous graph message delivery. 3) We discovered key brain regions that are closely associated with early AD and found abnormal connectivity of lesioned brain regions at various stages of AD deterioration. The results of model validation in the alzheimer’s disease neuroimaging initiative (ADNI) database showed that the LSMHF-GNN achieved competitive results in the diagnosis of early AD and identified abnormal connectivity consistent with clinical diagnosis.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128333"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144314220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Employing antenna selection to improve energy efficiency in massive MIMO system based on optimal selection of resources","authors":"Abhishek M.B. , Vibha T.G. , Bindu H.M. , Lavanya Krishna Murthy , Santhosh B.","doi":"10.1016/j.eswa.2025.128507","DOIUrl":"10.1016/j.eswa.2025.128507","url":null,"abstract":"<div><div>Multiple-Input Multiple-Output (MIMO) techniques require a “large number of Base Stations (BSs) and antennas to efficiently serve a large volume of users.” Antenna selection is important for solving the issues of increased costs of Radio Frequency (RF) chains and significant power consumption. The primary objective of the proposed work is to consider a BS with an “optimal number of antennas” to minimize the power utilization of the MIMO system. Therefore, an improved MIMO system is developed using optimal selection of antennas and resource utilization to maximize Spectral Efficiency (SE) and Energy Efficiency (EE). The SE and EE of the system are increased by optimally selecting the active antennas and resources required for the operation using the Hybrid Secretary Bird with Duck Swarm Algorithm (HSB-DSA). Thus, the proposed MIMO system uses adaptive algorithms to dynamically select the active antennas “based on real-time channel conditions and system” requirements, and this optimal allocation is “used to improve the performance” and efficiency of wireless transmission networks. To ensure the reliability of the improved model, performance analyses are conducted on the improved optimal antenna selection and resource utilization model with different encoding and modulation techniques. The encoding techniques of “Space-Time Block Code (STBC) such as Alamouti,” Golden, and Silver are taken for the experimentation, and the modulation techniques including <span><math><mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow></math></span> and <span><math><mrow><mn>4</mn><mo>×</mo><mn>4</mn></mrow></math></span> Differential Chaos Shift Keying (DCSK) are taken for the experimentation to ensure the effectiveness of MIMO systems in terms of throughput and data rate under diverse channel conditions.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128507"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144330876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-driven tactical recommendations for table tennis: decision optimization with probabilistic interaction model and technical quantification system","authors":"Duo Na , Qiuhu Xue","doi":"10.1016/j.eswa.2025.128616","DOIUrl":"10.1016/j.eswa.2025.128616","url":null,"abstract":"<div><div>Current tactics in table tennis predominantly rely on empirical knowledge, lacking systematic and adaptive decision-making frameworks. To address this gap, an intelligent tactical decision-making system integrating probabilistic modeling, quantified technical proficiency, and deep reinforcement learning is proposed in this study. A probabilistic interaction model is established to formalize tactical dynamics, explicitly defining decision variables—spin, drop point, and quality—while accounting for technical constraints between players. Central to the framework is the Technical Capability Parameter Table (TCPT), a novel quantification system that evaluates athletes’ adaptability and stability across diverse ball conditions. Leveraging these components, the Multi-Head Hybrid-Decision Proximal Policy Optimization (MHHD-PPO) algorithm is developed to optimize hybrid action spaces (discrete tactical choices and continuous quality control) and exploit temporal dependencies in gameplay. Experiments demonstrate that agents trained with MHHD-PPO achieve a 63.5 % win rate against baseline strategies, with real-world validation involving university athletes revealing a significant win rate improvement (e.g., 48 % to 59 % in Player B vs. Player C matchups). The system provides actionable tactical recommendations through three operational modes: (1) adaptive serve/return strategies tailored to opponent weaknesses, (2) dilemma-specific solution generation, and (3) self-play optimization. By bridging theoretical models with practical training paradigms, this work advances the intelligent development of table tennis tactics, offering coaches and athletes a data-driven tool for strategic refinement. The integration of probabilistic interaction modeling, technical proficiency quantification, and hybrid reinforcement learning establishes a replicable framework for tactical intelligence in dynamic sports.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128616"},"PeriodicalIF":7.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}