{"title":"ClipCap+ +: An efficient image captioning approach via image encoder optimization and LLM fine-tuning","authors":"Ruiqin Wang , Ye Wu , Zhenzhen Sheng","doi":"10.1016/j.asoc.2025.113469","DOIUrl":"10.1016/j.asoc.2025.113469","url":null,"abstract":"<div><div>ClipCap (CLIP prefix for image captioning), a leading image captioning model, exhibits limitations in recognizing images within specific domains. This study presents ClipCap+ +, an enhanced version of ClipCap that integrates key-value pair and residual connection modules. The key-value pair module implements a few-shot learning strategy by incorporating domain-specific knowledge, thereby improving the model's capability to recognize specialized image categories. The residual connection module optimizes the weight distribution between the pre-trained model and the key-value pair module, enhancing the model's transfer learning performance. During the inference phase, the model processes an input image through a multi-stage pipeline: (1) the visual encoder extracts image features to generate a hard visual prompt, (2) the key-value pair module dynamically constructs a domain-specific soft prompt, and (3) these complementary prompts are jointly fed into the large language model to synthesize the final image description. Extensive experiments on in-domain, near-domain, and cross-domain tasks show ClipCap+ + surpasses state-of-the-art models in accuracy, training efficiency, and generalization.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113469"},"PeriodicalIF":7.2,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji Li , Xu He , Quan Zhou , Carl Anthony , Bo Wang , Guoxiang Lu , Hongming Xu
{"title":"Heuristic information-rich evolutionary modelling for engine soft sensors of hybrid electric vehicles","authors":"Ji Li , Xu He , Quan Zhou , Carl Anthony , Bo Wang , Guoxiang Lu , Hongming Xu","doi":"10.1016/j.asoc.2025.113468","DOIUrl":"10.1016/j.asoc.2025.113468","url":null,"abstract":"<div><div>Under the explosive demand of the electrified powertrain market, modelling schemes with strong robustness, low cost, and fast implementation are urgently required for hybrid vehicle engine development. This paper presents a data-driven holistic solution integrated with heuristic information-rich feature selection for engine soft sensors, i.e., fuel consumption, thermal efficiency, and volumetric efficiency, namely heuristic information-rich warm-start evolutionary modelling framework. Five filter methods are developed as heaters, and their selected features are converted to warm up the initialisation process in the evolutionary modelling, alleviating the inefficient exploration and local optimal problems caused by the pseudo-random initialisation of a single wrapper during the optimisation process. Meanwhile, a new factor of heuristic information richness is introduced to determine and adjust the proportion of the filter particles, further accelerate evolutionary convergence through the filter information guidance and avoid local optimality through free exploration of the particles without filter information, achieving a balance between computational efficiency and global search capability. Validated by the testing bench of a BYD 1.5 L naturally aspirated engine specially made for a hybrid powertrain, the Lasso method is the best heater and helps the proposed framework to reduce up to 54.9 % of mean squared error compared to that of the cold-start one. Compared to industry-used modelling frameworks, the proposed one achieves the equivalent prediction performance while reducing the database size by up to 85 %.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113468"},"PeriodicalIF":7.2,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent decision support system for multi-objective 3D container loading using genetic algorithm combined with artificial bee colony","authors":"Suriya Phongmoo , Komgrit Leksakul , Chaichana Suedumrong , Chakkrapong Kuensaen","doi":"10.1016/j.asoc.2025.113473","DOIUrl":"10.1016/j.asoc.2025.113473","url":null,"abstract":"<div><div>Efficient container loading is a complex and critical logistics challenge, especially when dealing with strongly heterogeneous boxes in three dimensions. This study proposes an intelligent decision support system that addresses the 3D Single Container Loading Problem (3D-SCLP) using a hybrid meta-heuristic approach combining Genetic Algorithm (GA) and Artificial Bee Colony (ABC). The system introduces rotation constraints as a decision variable and optimizes for two objectives: maximizing profit and minimizing unused space. A mathematical model based on the bottom-left fill (BLF) method was developed to ensure feasible loading with non-overlapping placements and valid rotations. Experimental results on 15 real-world and 225 synthetic test cases demonstrate the superiority of the proposed GA+ABC method over standalone algorithms in both solution quality and robustness. The system achieves the lowest hypervolume metric (119.28), indicating better convergence to Pareto-optimal fronts, and provides practical feasibility for real-world logistics optimization.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113473"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DFSMCG-Net: A Siamese change detection network based on Differential Feature Selection and Multi-Scale Guidance Strategies","authors":"Hang Xue, Ke Liu, Caiyi Huang, Xianhong Meng","doi":"10.1016/j.asoc.2025.113372","DOIUrl":"10.1016/j.asoc.2025.113372","url":null,"abstract":"<div><div>Change detection technology effectively identifies surface changes but encounters significant challenges, including class imbalance between foreground and background and interference from pseudo-changes caused by factors such as illumination variations and geometric distortions. We propose a location-sensitive Differential Feature Selection and Multi-Scale Change Feature Guidance Network (DFSMCG-Net) to address these issues. The DFSMCG-Net introduces a Differential Feature Selection Module (DFSM) that leverages the spatial location information of bi-temporal features. This module captures spatiotemporal differential features at the exact location along the X-axis and Y-axis and integrates these features through cross-fusion to establish long-range pixel dependencies. The resulting multi-level differential features provide the network with a detailed temporal context for detecting changes. We develop a Multi-Scale Change Feature Guidance Module (MCFGM) based on a multi-head self-attention mechanism to further enhance the fusion of multi-level differential features and suppress interference from non-differential features. This module assigns each attention head a distinct non-overlapping window, dynamically adjusting window sizes according to the feature map dimensions. This approach facilitates the integration of multi-scale differential features, improving the network’s capacity to represent change-related features. Experimental results demonstrate that the proposed DFSMCG-Net performs significantly better than state-of-the-art methods on benchmark datasets, including LEVIR-CD, CDD, SYSU-CD and S2Looking. The model is particularly effective in mitigating pseudo-change phenomena under conditions of extreme class imbalance.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113372"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A bilayer segmentation-recombination network for accurate segmentation of overlapping C. elegans","authors":"Mengqian Ding , Jun Liu , Yang Luo , Jinshan Tang","doi":"10.1016/j.asoc.2025.113459","DOIUrl":"10.1016/j.asoc.2025.113459","url":null,"abstract":"<div><div>Caenorhabditis elegans (<em>C. elegans</em>) is an excellent model organism because of its short lifespan and high degree of homology with human genes, and it has been widely used in a variety of human health and disease models. However, the segmentation of <em>C. elegans</em> remains challenging due to the following reasons: 1) the activity trajectory of <em>C. elegans</em> is uncontrollable, and multiple nematodes often overlap, resulting in blurred boundaries of <em>C. elegans</em>. This makes it impossible to clearly study the life trajectory of a certain nematode; and 2) in the microscope images of overlapping <em>C. elegans</em>, the translucent tissues at the edges obscure each other, leading to inaccurate boundary segmentation. To solve these problems, a Bilayer Segmentation-Recombination Network (BR-Net) for the segmentation of <em>C. elegans</em> instances is proposed. The network consists of three parts: A Coarse Mask Segmentation Module (CMSM), a Bilayer Segmentation Module (BSM), and a Semantic Consistency Recombination Module (SCRM). The CMSM is used to extract the coarse mask, and we introduce a United Attention Module (UAM) in CMSM to make CMSM better aware of nematode instances. The Bilayer Segmentation Module (BSM) segments the aggregated <em>C. elegans</em> into overlapping and non-overlapping regions. This is followed by integration by the SCRM, where semantic consistency regularization is introduced to segment nematode instances more accurately. Finally, the effectiveness of the method is verified on the <em>C. elegans</em> dataset. The experimental results show that BR-Net exhibits good competitiveness and outperforms other recently proposed segmentation methods in processing <em>C. elegans</em> occlusion images.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113459"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144279968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regularizing Model Predictive Control for pixel-based long-horizon tasks","authors":"Yao-Hui Li, Feng Zhang, Qiang Hua, Chun-Ru Dong","doi":"10.1016/j.asoc.2025.113377","DOIUrl":"10.1016/j.asoc.2025.113377","url":null,"abstract":"<div><div>Planning has been proven to be an effective strategy for dealing with complex tasks in environments. However, due to the constraints of computational budget and the accumulated model biases, planning for pixel-based long horizon tasks with limited samples remains a great challenge. To address this issue, a <strong>R</strong>egularized <strong>M</strong>odel <strong>P</strong>redictive <strong>C</strong>ontrol (<strong>RMPC</strong>) was proposed in this study. RMPC performs trajectory optimization using short-term reward estimates and long-term return estimates, which avoids the high burden of long-horizon planning. Additionally, an implicit regularization mechanism is employed to improve the robustness of the generated environment model and reliability of the value function estimation, which helps to reduce the risk of accumulated model biases. Extensive comparison experiments and ablation studies are performed on the benchmark datasets for evaluating the proposed RMPC. And empirical results show that RMPC outperforms the previous SOTA algorithms in terms of sample-efficiency (20.88% performance improvement) and model stability (56.39% standard deviation reduction) on pixel-based continuous control tasks from DMControl-100k benchmark. Our code is available at: <span><span>https://github.com/Arya87/RMPC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113377"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144330809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning: Historical overview from inception to actualization, models, applications and future trends","authors":"Olufisayo S. Ekundayo, Absalom E. Ezugwu","doi":"10.1016/j.asoc.2025.113378","DOIUrl":"10.1016/j.asoc.2025.113378","url":null,"abstract":"<div><div>Deep learning stands at the forefront of contemporary machine learning techniques and is well-known for its outstanding predictive accuracy, adaptability to data variability, and remarkable ability to generalize across diverse domains. These attributes have spurred rapid progress and the emergence of novel iterations within the discipline. Yet, this swift evolution often obscures the foundational breakthroughs, with even trailblazing researchers at risk of fading into obscurity despite their seminal contributions. This study aims to provide a historical narrative of deep learning, tracing its origins from the cybernetic era to its current state-of-the-art status. We critically examine the contributions of individual pioneer scholars who have profoundly influenced the development of deep neural networks under the taxonomy of supervised, unsupervised, and reinforcement learning. Furthermore, the study also discusses the trending deep neural network architectures, explaining their operational principles, confronting associated challenges, exploring real-world applications, and outlining potential future trajectories that could offer a starting point for aspiring researchers in the field.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113378"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilevel probabilistic wind power forecasting using an adaptive Informer network","authors":"Sen Xie , Yuyang Hua , Shan Lu , Xin Jin","doi":"10.1016/j.asoc.2025.113460","DOIUrl":"10.1016/j.asoc.2025.113460","url":null,"abstract":"<div><div>Effective and feasible wind power forecasting is critical to the resource allocation and safe control of the power system. Nevertheless, the volatility and randomness of wind speed changing leads to deviations in actual wind power output. Therefore, a multilevel probabilistic wind power forecasting strategy using an adaptive Informer network is developed. To separate the long-term trend and periodic fluctuation of the raw series, wind power is firstly decomposed into equal-length sequences of multilevel frequencies through the maximum discrete overlapping wavelet transform (MODWT). Simultaneously, a piecewise adaptive loss function and an activation function for large range are considered in a novel Informer network, and the inherent structure and nonlinear features at each frequency are extracted with two layers of encoders and one layer of decoders. Moreover, the ensemble batch prediction intervals (EnbPI) are exploited to extend the deterministic forecasting to probabilistic information. Ultimately, a historical dataset is applied from an offshore wind power system in Belgium to verify that the forecasting performance, and quantitative analysis shows that the model achieves a mean absolute error of 2.5 % and a root mean squared error of 3.8 %. The developed strategy handles the volatility and complexity of wind data, providing reliable support for real wind power plant.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113460"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive feature mixing with Vision Transformers for clinical image analysis","authors":"Susmita Ghosh, Swagatam Das","doi":"10.1016/j.asoc.2025.113259","DOIUrl":"10.1016/j.asoc.2025.113259","url":null,"abstract":"<div><div>The Vision Transformer (ViT) is an adaptation of the Transformer architecture that shows promise in image classification. However, limited training samples and the complex attributes of such images hinder its performance in identifying medical conditions from clinical images. To address this challenge, we propose a modified ViT architecture called ReMixViT by incorporating an efficient MLP-Mixer layer and reordering the residual blocks within the encoder block. This modification improves feature mixing and enhances the model’s generalization ability. We enhanced ReMixViT by incorporating an efficient MLP-Mixer layer. Additionally, we design two hybrid architectures, Res-ReMixViT and Res-ReMixViT+, by integrating a Convolutional Neural Network (ResNet50) and ReMixViT encoder blocks, considering feature maps of single and multiple scales, respectively. We evaluated the proposed architectures using six diverse medical imaging datasets with varying modalities and medical conditions. Our comparative study reveals that the ReMixViT and hybrid models outperform the vanilla ViT models and hybrid models with ViT encoder blocks, respectively, based on widely accepted performance measures. Specifically, we observe improvements of 4.62% and 3.08% in the F1-score performance metric. Moreover, when combined with data augmentation algorithms, the proposed hybrid architectures surpass other state-of-the-art hybrid networks. In addition to performance evaluation, we provide visual explanations through attention maps and the gradient flow of our model. These visual explanations contribute to the interpretability of the Artificial Intelligence (AI) system, assisting medical practitioners in drawing inferences from an explainable AI perspective. Moreover, an extended study demonstrates that the proposed modifications can be successfully adapted to other vision transformer architectures, resulting in enhanced performance.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113259"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongliang Ma , Fang Zhao , Ye Li , Xin Qu , Xin Jiang , Hao Wu , Xi Chen , Min Liu
{"title":"A scalable monocular 3D detector with Superpixel Feature Pyramid Network","authors":"Dongliang Ma , Fang Zhao , Ye Li , Xin Qu , Xin Jiang , Hao Wu , Xi Chen , Min Liu","doi":"10.1016/j.asoc.2025.113389","DOIUrl":"10.1016/j.asoc.2025.113389","url":null,"abstract":"<div><div>Monocular 3D object detection plays a pivotal role in vehicle perception systems. Current methods frequently struggle to effectively extract scene-level semantic information, and the availability of monocular 3D detectors tailored to diverse embedded devices with varying computing power may still be limited. This paper introduces MonoYolo, a scalable detector designed for practicality and efficiency with varying resource constraints. In particular, we design a Superpixel Feature Pyramid Network (SFPN) that automatically groups pixels with similar attributes together. Experimental results on KITTI and nuScenes datasets showcase the advantageous performance of MonoYolo over superior monocular detectors for large models, while the lightweight model maintains real-time detection capabilities. Meanwhile, the proposed SFPN offers a seamless integration into existing image-only 3D detectors, presenting a plug-and-play solution for enhanced monocular 3D object detection performance.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"180 ","pages":"Article 113389"},"PeriodicalIF":7.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}