Gengrui Li , Daoyun Tang , Jinhuan Huang , Shaoning Zhu , Jiangtao Cao
{"title":"Dual-scale adaptive attention-based Vision transformer with iterative refinement for clarity and consistency in multi-focus image fusion","authors":"Gengrui Li , Daoyun Tang , Jinhuan Huang , Shaoning Zhu , Jiangtao Cao","doi":"10.1016/j.engappai.2025.112777","DOIUrl":"10.1016/j.engappai.2025.112777","url":null,"abstract":"<div><div>Multi-focus Image Fusion (MFIF) has become a prominent role in combining focused regions of several source images into a single all-in-focus fused image. However, existing approaches have the limitation of maintaining global spatial coherence and sharp details. To overcome these limitations, the Dual-Scale Adaptive Attention-Based Vision Transformer (DAA-ViT) model is proposed, which integrates fine-scale and coarse-scale attention, with the aim of maintaining local high-resolution information along with structural coherence. Additionally, an Iterative Refinement Fusion (IRF) is introduced to refine focus boundaries through multiple iterations for enhancing overall image definition, while mitigating fusion artifacts and focus selection errors. Especially, this Artificial Intelligence (AI)-based approach is efficient in complex scenes with inconsistent depth levels, which is suitable for applications like remote sensing and medical image processing. Experimental results of several benchmark datasets demonstrate that the proposed method attains better results than existing methods with a Mutual Information (MI) of 8.9671, Structural Similarity Index Measure (SSIM) of 0.9211, Peak Signal-To-Noise Ratio (PSNR) of 36.728 dB, and Lower Root Mean Square Error (RMSE) of 1.5482. Compared to the existing Swin Transformer and Convolutional Neural Network (STCU-Net) model, the proposed model attains 2.65 % improvement in PSNR, 1.99 % improvement in MI, 1.11 % improvement in Structural Similarity Index Measure, and 5.13 % reduction in RMSE. These findings demonstrate the efficiency of AI-based fusion strategies in delivering high-quality all-in-focus images and emphasize their applications in medical imaging and remote sensing processing.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112777"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Zheng , Jin Han , Ke-Yin Chen , Xin-Yu Cao , Xin-Zheng Lu , Jia-Rui Lin
{"title":"Translating regulatory clauses into executable codes for building design checking via large language model driven function matching and composing","authors":"Zhe Zheng , Jin Han , Ke-Yin Chen , Xin-Yu Cao , Xin-Zheng Lu , Jia-Rui Lin","doi":"10.1016/j.engappai.2025.112823","DOIUrl":"10.1016/j.engappai.2025.112823","url":null,"abstract":"<div><div>Translating clauses into executable code is a vital stage of automated rule checking (ARC) and is essential for effective building design compliance checking, particularly for rules with implicit properties or complex logic requiring domain knowledge. Thus, by systematically analyzing building clauses, 66 atomic functions are defined first to encapsulate common computational logics. Then, LLM-FuncMapper is proposed, a large language model (LLM)-based approach with rule-based adaptive prompts that match clauses to atomic functions. Finally, executable code is generated by composing functions through the LLMs. Experiments show LLM-FuncMapper outperforms fine-tuning methods by 19 % in function matching while significantly reducing manual annotation efforts. Case study demonstrates that LLM-FuncMapper can automatically compose multiple atomic functions to generate executable code, boosting rule-checking efficiency. To our knowledge, this research represents the first application of LLMs for interpreting complex design clauses into executable code, which may shed light on further adoption of LLMs in the construction domain.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112823"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haipeng Yang, Sibo Liu, Zihao Chen, Yuanyuan Ge, Lei Zhang
{"title":"Multitasking optimization for personalized exercise group recommendation in E-learning environments","authors":"Haipeng Yang, Sibo Liu, Zihao Chen, Yuanyuan Ge, Lei Zhang","doi":"10.1016/j.engappai.2025.112820","DOIUrl":"10.1016/j.engappai.2025.112820","url":null,"abstract":"<div><div>Personalized exercise group recommendation (PEGR) is to select a set of exercises from a large exercise bank for students, which plays an important role in E-learning. Due to the complexity of real application scenarios, PEGR is usually modeled as a large-scale constrained multi-objective optimization problem and solved by multi-objective evolutionary algorithms (MOEAs). However, the “curse of dimensionality” and the complex constraints handling are the two challenges encountered when designing MOEAs to solve the PEGR problem. To this end, we propose a novel evolutionary tri-tasking algorithm named ETT-PEGR to tackle the challenges of solving the PEGR, in which two auxiliary tasks are constructed to help solve the original task through knowledge transfer. Specifically, the first concept-recommended auxiliary task is designed to recommend knowledge concepts instead of exercises to students, which can help accelerate the convergence speed of the original task since the number of concepts is much smaller than that of exercises. The second constraint-ignored auxiliary task is designed to help the solutions of the original task to cross the infeasible region. In addition, a novel knowledge transfer mechanism based on different encoding strategies is proposed for the original task and the two auxiliary tasks, which can effectively realize the knowledge transfer between them. Experimental results on four popular datasets show that ETT-PEGR outperforms the state-of-the-art algorithms for PEGR.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112820"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new multi-object tracking algorithm based on Sparse Detection Transformer","authors":"Jun Miao , Maoxuan Zhang , Yuanhua Qiao","doi":"10.1016/j.engappai.2025.112666","DOIUrl":"10.1016/j.engappai.2025.112666","url":null,"abstract":"<div><div>Multi-object tracking (MOT) is crucial for intelligent surveillance and autonomous driving. However, existing Transformer-based methods often suffer from an accuracy-efficiency trade-off due to high computational complexity, limiting real-time applicability. To address this, we propose SparseDeTrack (Sparse Detection Tracking), an efficient MOT framework based on the tracking-by-detection (TBD) paradigm. In detection, we employ a sparse token Transformer with a 30 % token retention rate, effectively reducing computational cost while retaining essential features. In tracking, we remove the Re-Identification (ReID) module and enhance the Extended Kalman Filter (EKF) by directly predicting the width and height instead of the aspect ratio of bounding boxes, improving both localization accuracy and nonlinear motion modeling. Furthermore, ByteTrack (Multi-Object Tracking by Associating Every Detection Box) is integrated for secondary association, increasing robustness under occlusion. We conduct extensive experiments on MOTChallenge 17 (MOT17), MOTChallenge 20 (MOT20), and DanceTrack benchmarks. On the MOT17 test set, SparseDeTrack achieves a Multiple Object Tracking Accuracy (MOTA) of 75.4, outperforming Transformer-based methods such as MOTR (End-to-End Multiple-Object Tracking with Transformer), Trackformer (Multi-Object Tracking with Transformers), and TransTrack (Multiple Object Tracking with Transformer) by 2.0, 1.3, and 0.2 points, respectively, while attaining a high inference speed of 44.5 frames per second (FPS), balancing accuracy and efficiency. It reaches 65.6 MOTA on crowded MOT20 and 89.1 MOTA on nonlinear-motion DanceTrack, comparable to state-of-the-art methods. These results confirm that SparseDeTrack delivers both high-precision tracking and real-time inference in complex scenarios, making it a promising solution for real-world applications in intelligent surveillance and autonomous driving.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112666"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145334894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Zheng , Ru Wen , Can Han , Wei Chen , Chen Liu , Jun Wang , Kui Su
{"title":"Tissue-contrastive semi-masked autoencoders for segmentation pretraining on chest computed tomography","authors":"Jie Zheng , Ru Wen , Can Han , Wei Chen , Chen Liu , Jun Wang , Kui Su","doi":"10.1016/j.engappai.2025.112885","DOIUrl":"10.1016/j.engappai.2025.112885","url":null,"abstract":"<div><div>Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects’ features from unlabeled images, which may face two limitations when applied to chest Computed Tomography (CT): (1) inefficient feature learning due to complex anatomical details presented in CT images, and (2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: (1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and (2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap between the upstream and downstream models. Through these strategies, the pretrained model can learn homogeneous tissue representations to improve the segmentation of heterogeneous lesions. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks. The code and datasets is available at: <span><span>https://github.com/zhengjjjjie/TCS-MAE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112885"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kim-Thuy Kha , Anh H. Vo , Van-Vang Le , Oh-Young Song , Yong-Guk Kim
{"title":"Temporal diffuser: Timing scale-aware modulation for sign language production","authors":"Kim-Thuy Kha , Anh H. Vo , Van-Vang Le , Oh-Young Song , Yong-Guk Kim","doi":"10.1016/j.engappai.2025.112739","DOIUrl":"10.1016/j.engappai.2025.112739","url":null,"abstract":"<div><div>Recent advances in Sign Language Production (SLP) highlight denoising diffusion models as promising alternatives to traditional autoregressive methods. Most existing approaches follow a two-stage pipeline that encodes sign motion into discrete latent codes, often sacrificing Space–Time fidelity and requiring gloss annotations or complex codebooks. Transformer-based models aim to simplify this, but often produce overly smooth, unnatural motions. We introduce Sign Language Production with Scale-Aware Modulation (SignSAM), a novel single-stage, gloss-free SLP framework that directly synthesizes motion in continuous space, preserving fine temporal details. At its core is a Space–Time U-Net that learns compact temporal features by jointly downscaling the frame and sign feature dimensions, thereby reducing computational cost compared to a no-pyramid UNet or a pyramid UNet without consistency between dimensions. To further enhance temporal precision, we propose a Timing Scale-Aware Modulation module that fuses multiscale temporal resolutions for better motion coherence. Experiments on PHOENIX14T and How2Sign show that SignSAM achieves state-of-the-art (SOTA) fluency, accuracy, and naturalness, offering an efficient and expressive solution for SLP. Our project homepage is <span><span>https://kha-kim-thuy.github.io/SLP-Demo/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112739"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Huang , Jee-Hyong Lee , Yanling Ge , Seok-Beom Roh , Xue Zhao
{"title":"Adaptive spatial–temporal graph attention network for real-time traffic forecasting","authors":"Hao Huang , Jee-Hyong Lee , Yanling Ge , Seok-Beom Roh , Xue Zhao","doi":"10.1016/j.engappai.2025.112883","DOIUrl":"10.1016/j.engappai.2025.112883","url":null,"abstract":"<div><div>Accurate and efficient Multivariate Time Series Forecasting (MTSF) plays a critical role in intelligent transportation systems by supporting real-time traffic management. However, achieving reliable forecasting remains challenging due to complex and dynamically evolving spatial–temporal patterns. Existing forecasting methods often fail to adapt effectively to these dynamic traffic conditions and typically incur high computational costs, significantly limiting their deployment in real-time traffic management scenarios. To address these engineering challenges, this study proposes a novel Attention-based Spatial-Temporal Network (ASTNet), explicitly designed for adaptive and efficient real-time traffic forecasting. ASTNet introduces two innovative Artificial Intelligence (AI)-driven modules: an Adaptive Spatial Graph Encoder (ASGE), which dynamically models evolving spatial dependencies from real-time traffic data, thus overcoming the limitations of static graph structures; and a Temporal Attention-Gated Unit (TAGU), which efficiently captures critical temporal dependencies through the integration of recurrent gating mechanisms and self-attention techniques. Extensive evaluations conducted on widely-used traffic benchmark datasets (PEMS04, METR-LA, etc.) confirm that ASTNet achieves superior predictive accuracy and robustness compared to state-of-the-art methods, while significantly reducing inference latency. Ablation studies further validate that the combined innovations of ASGE and TAGU are crucial for ASTNet’s outstanding performance, highlighting its practical suitability and strong potential for deployment in real-time intelligent transportation applications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112883"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation-enhanced Medical Visual Question Answering with mask-prompt alignment using contrastive learning and multitask object grounding","authors":"Qishen Chen , Huahu Xu , Wenxuan He , Xingyuan Chen , Minjie Bian , Honghao Gao","doi":"10.1016/j.engappai.2025.112866","DOIUrl":"10.1016/j.engappai.2025.112866","url":null,"abstract":"<div><div>Medical Visual Question Answering (MedVQA) aims to provide clinical suggestions by analyzing medical images in response to textual queries. However, existing methods struggle to accurately identify anatomical structures and pathological abnormalities, leading to unreliable predictions. Many deep learning-based approaches also lack interpretability, making their diagnostic reasoning opaque. To address these challenges, this paper proposes Mask-Prompt Aligned Visual Question Answering (MPA-VQA), a two-stage framework that integrates segmentation information into the MedVQA process. First, a segmentation model is trained to detect key structures within medical images. To mitigate the issue of limited segmentation annotations, this paper introduces an improved CutMix-based data augmentation strategy. Second, segmentation masks are used to generate prompts, which are incorporated into the question-answering process for the first time to enhance interpretability. Third, to improve the alignment between image, mask, and prompt representations, this paper proposes a dual-granularity mask-prompt alignment (MPA) method. At the image level, MPA employs contrastive learning to encourage global consistency, while at the object level, it leverages multi-task object grounding to enhance localization accuracy. A mask-guided attention mechanism is also introduced to ensure the model focuses on clinically relevant image regions. Finally, the proposed MPA-VQA is validated on the SLAKE and MedVQA-GI datasets, demonstrating state-of-the-art performance. Notably, MPA-VQA improves location-related question accuracy by 6.37% on MedVQA-GI. MPA-VQA is also a plug-and-play framework that can be seamlessly integrated into existing MedVQA architectures without requiring major modifications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112866"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spiking neural networks with uncertainty model of stochastic sampling for circuit yield enhancement","authors":"Zenan Huang, Wenrun Xiao, Haojie Ruan, Shan He, Donghui Guo","doi":"10.1016/j.engappai.2025.112523","DOIUrl":"10.1016/j.engappai.2025.112523","url":null,"abstract":"<div><div>In semiconductor manufacturing, yield analysis plays a critical role in optimizing production processes, but traditional methods, such as Monte Carlo simulations, often rely on idealized models and require extensive computational resources. These approaches struggle to account for the inherent uncertainties of real-world manufacturing, limiting their practical applicability. Spiking Neural Networks (SNNs), inspired by biological neural processes, offer a promising solution by efficiently handling large-scale data while maintaining low power consumption and real-time processing capabilities. This paper introduces an uncertainty-aware spiking learning model that reduces the impact of non-ideal simulation results by incorporating input uncertainties through stochastic sampling, where neuron firing states are influenced by both input noise and neuronal characteristics. To further improve yield, the model leverages reinforcement learning to optimize process parameters iteratively. Extensive experiments on two circuit yield simulation datasets demonstrate that the proposed method outperforms traditional approaches in handling uncertainties and provides more reliable and accurate yield predictions, offering a robust and efficient alternative for semiconductor process optimization.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112523"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145334893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenliang Xu , Suting Chen , Feilong Bi , Chao Wang , Xiao Shu
{"title":"GMFIMamba: Remote sensing change detection based on group Mamba feature interaction","authors":"Wenliang Xu , Suting Chen , Feilong Bi , Chao Wang , Xiao Shu","doi":"10.1016/j.engappai.2025.112878","DOIUrl":"10.1016/j.engappai.2025.112878","url":null,"abstract":"<div><div>With the advancement of satellite technology, high-resolution remote sensing images have been widely used in the field of change detection. Building Change Detection (BCD) and Building Damage Assessment (BDA) are both sub-tasks of change detection. BCD aims to detect structural changes in buildings over time, whereas BDA focuses on assessing the level of building damage after a disaster. BCD is of great value for urban planning, while BDA plays a crucial role in post-disaster rescue efforts. To address these tasks, we propose a change detection method based on Mamba, named GMFIMamba. Specifically, we design a Convolution–Visual State Space (Conv-VSS) block, which combines the local feature extraction capability of Convolutional Neural Networks (CNNs) with the global feature modeling ability of Mamba. By integrating local and global features, our approach improves the accuracy of change region detection. To tackle the issue of insufficient feature extraction for small-scale buildings in existing models, we introduce the Multi-branch Dilated Convolution Feature Enhancement Module (MCFEM). In addition, we design the Grouped Mamba-Based Bitemporal Features Interaction Module (GMBFIM) to facilitate effective interaction between bitemporal images, leading to more accurate change feature extraction. Experiments on three public datasets demonstrate that the proposed method achieves superior performance in both BCD and BDA tasks, proving its effectiveness.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112878"},"PeriodicalIF":8.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145335236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}