Shuo Liu , Lei Shi , Yucheng Shi , Yufei Gao , Xiaole Sun
{"title":"Traffic scene perception via multimodal large language model with data augmentation and efficient training strategy","authors":"Shuo Liu , Lei Shi , Yucheng Shi , Yufei Gao , Xiaole Sun","doi":"10.1016/j.asoc.2025.113210","DOIUrl":"10.1016/j.asoc.2025.113210","url":null,"abstract":"<div><div>Intelligent mobility, driven by advancements in deep learning and computing power, enhances transportation efficiency and societal connectivity, fostering economic and urban development. Current computer vision solutions often struggle to capture the complex details or understand the context within traffic scenes, limiting advanced intelligent mobility and raising safety concerns. Multimodal Large Language Models (MLLMs), by integrating linguistic and visual data, can aid vehicles and transportation systems in gaining a deeper understanding of the real-world traffic scenes, offering solutions to current challenges. Nevertheless, existing approaches predominantly employ MLLMs as instruments for querying and engaging with traffic infrastructure, rather than empowering these models to genuinely comprehend the traffic environment. This limitation curtails the potential of MLLMs and may even pose safety risks. In this paper, we first introduce a data augmentation framework designed to transform raw data into datasets suited for specific training objectives, thereby addressing issues related to data scarcity. Secondly, we propose a learning rate-based staged training strategy that segments the training process into distinct stages. This strategy involves deploying datasets targeted at various training objectives according to the patterns of parameter changes observed in different stages, thereby enhancing the training efficiency of the model. Utilizing these methods, we present InsightGPT, a model endowed with robust understanding and reasoning capabilities in traffic scenarios. In experiments conducted across six tasks, InsightGPT consistently outperforms baseline MLLMs in evaluating both the overall traffic scenes and individual objects within it, demonstrating its superior traffic comprehension and reasoning abilities. InsightGPT’s parameters and deployment details are available at <span><span>https://github.com/JinleLiu/InsightGPT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113210"},"PeriodicalIF":7.2,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic real-time aiming strategy optimization of multi-horizons heliostat fields","authors":"Yi’an Wang , Zhe Wu , Dong Ni","doi":"10.1016/j.asoc.2025.113215","DOIUrl":"10.1016/j.asoc.2025.113215","url":null,"abstract":"<div><div>The cloud prediction error is the dominant factor causing uncertainty and tracking errors in the heliostat aiming strategy optimization of Solar Power Tower (SPT) plants. In this work, an effective and scalable optimization method is proposed to address these cloud prediction errors. Specifically, a Multi-step Unscented Kalman filter (Mt-UKF) is developed to predict the SPT output flux trajectory under the influence of cloud prediction errors. Additionally, an improved Grey Wolf Optimization (GWO) algorithm is proposed, which integrates a reconfigured Grey Wolf social hierarchy with a Dimensionality Extension Learning (DEL) mechanism. This improvement enables the feedback correction of optimization errors in the heliostat field caused by cloud prediction errors. A simulated heliostat field is introduced as the experimental scenario to validate the proposed method. The Dimensionality Extension Learning Grey Wolf Optimization (DEL-GWO) algorithm is compared against four other state-of-the-art swarm intelligence algorithms. Experimental results and statistical tests demonstrate that the Mt-UKF combined with DEL-GWO exhibits high competitiveness and significantly outperforms the other algorithms. This combination effectively mitigates tracking errors induced by cloud prediction errors, demonstrating its robustness and applicability for heliostat field optimization.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113215"},"PeriodicalIF":7.2,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A lightweight YOLOv8-based model with Squeeze-and-Excitation Version 2 for crack detection of pipelines","authors":"Zhaochao Li , Linxuan Xiao , Meiling Shen , Xiya Tang","doi":"10.1016/j.asoc.2025.113260","DOIUrl":"10.1016/j.asoc.2025.113260","url":null,"abstract":"<div><div>Crack detection is crucial to the buried pipelines that transit water, gas, oil, etc. However, the traditional detection methods may lack accuracy and robustness for the pipelines in low-light and complex backgrounds. This study proposes a YOLOv8-GhostConv-SEV2 model based on the lightweight YOLOv8n framework, which optimizes feature extraction by introducing the GhostConv module and enhances noise suppression capability with the SEV2 (Squeeze-and-Excitation Version 2) attention mechanism. Based on a dataset of 11,135 images of pipelines in a low-light environment, the proposed model achieves 98.1 % (+2.62 %) precision, 95.7 % (+3.80 %) recall, 0.969 (+3.30 %) F1 score, and 82.4 % (+11.49 %) mAP50–95 on the test set. Additionally, the improved model size is only 5.67 MB (-5.34 %), which is lightweight and highly suitable for the crack detection of buried pipelines in complex environments.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113260"},"PeriodicalIF":7.2,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual object tracking: Review and challenges","authors":"Zeshi Chen , Caiping Peng , Shuai Liu , Weiping Ding","doi":"10.1016/j.asoc.2025.113140","DOIUrl":"10.1016/j.asoc.2025.113140","url":null,"abstract":"<div><div>Visual object tracking is a challenging research topic in computer vision. Numerous visual tracking algorithms have been proposed to solve this problem and achieved promising results. Traditional visual tracking algorithms can be categorized into generative and discriminative algorithms. Recently, deep learning based visual tracking algorithms attracted great attention from researchers due to their excellent performance. In order to summarize the development of visual object tracking, some studys have analyzed non-deep learning and deep learning visual tracking algorithms. In this paper, the most advanced tracking algorithms are comprehensively summarized, including both non-deep learning and deep learning based algorithms. First, traditional non-deep learning based tracking algorithms are categorized into generative and discriminative methods. The generative algorithms are summarized from three perspectives: kernel series, subspace series and sparse representation series, and the discriminative algorithms are summarized from two perspectives: correlation filtering series and deep features series. Then, deep learning based algorithms are divided into Siamese network series and Transformer series. Siamese network based algorithms are summarized from different innovation directions, and Transformer based algorithms are summarized from two perspectives: CNN-Transformer and Fully-Transformer. Moreover, the commonly used datasets and evaluation indicators are introduced in visual object tracking, as well as the results and analysis of representative algorithms. Finally, the challenges faced in visual object tracking were summarized and its future development trends were pointed out.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113140"},"PeriodicalIF":7.2,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dailin Huang , Hong Zhao , Jie Cao , Kangping Chen , Lijun Zhang
{"title":"Optimizing the flexible job shop scheduling problem via deep reinforcement learning with mean multichannel graph attention","authors":"Dailin Huang , Hong Zhao , Jie Cao , Kangping Chen , Lijun Zhang","doi":"10.1016/j.asoc.2025.113128","DOIUrl":"10.1016/j.asoc.2025.113128","url":null,"abstract":"<div><div>Job shop scheduling plays a crucial role in manufacturing informatization. Recently, significant progress has been made in terms of optimizing flexible job shop scheduling problems (FJSPs) via deep reinforcement learning (DRL). However, the complex structures of the disjunctive graphs encountered in FJSPs introduce a large amount of redundant information, and their oversized action spaces further increase the difficulty of training. To address these issues, a mean multichannel graph attention-proximal policy optimization (MCGA-PPO) model is proposed. First, the channel graph attention (CGA) mechanism reduces the amount of redundant information, allowing the agent to focus on task-relevant critical information. Second, for the first time, the overestimation phenomenon observed in FJSPs is explored in depth, and the MCGA method is developed to address the issue of overestimation from a single direction. MCGA employs information weighted across multiple channels to balance the estimation process. Furthermore, to address large action spaces, an entropy loss is introduced to optimize the exploration and exploitation processes of the agent. The experimental results confirm that our proposed model provides performance improvements of 1.22% and 1.29% on synthetic and classic datasets, respectively, demonstrating its effectiveness in addressing complex FJSPs.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113128"},"PeriodicalIF":7.2,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bautista Penayo , Vedrana Pribičević , Andrej Novak
{"title":"Financial asset allocation strategies using statistical and Machine Learning Models: Evidence from comprehensive scenario testing","authors":"Bautista Penayo , Vedrana Pribičević , Andrej Novak","doi":"10.1016/j.asoc.2025.113193","DOIUrl":"10.1016/j.asoc.2025.113193","url":null,"abstract":"<div><div>Accurate return and risk forecasts are critical for asset allocation; however, traditional models such as Mean-Variance (MV) and Risk Parity (RP) suffer from significant estimation errors and sensitivity to noise. We address these challenges by comparing six asset allocation strategies—four MV configurations and two RP-based approaches—against an equally weighted benchmark, using 111 stocks from the NASDAQ-100 and NASDAQ Financial-100 indices over 2000–2019. Two of the MV strategies, one of which we introduce, combine both econometric and Machine Learning (ML) forecasts for returns (via Facebook Prophet) and volatility (via GARCH), while another established ML variation of RP uses Hierarchical Risk Parity (HRP). The proposed hybrid MV strategy combines interpretable, regulatory-compliant methods with ML methodology. Our hypothesis was that ML strategies would significantly outperform their simpler counterparts, and that our proposed MV approach would be particularly competitive. Scenario testing was performed to assess the generalizability of the strategies. Rigorous scenario testing—varying stock sets, training periods, and hyperparameter configurations—reveals that: (i) our ML-enhanced Maximum Sharpe Ratio (MSR) strategy achieves up to 1490% higher Return on Investment (ROI) than the benchmark and 1390%–1909% higher than alternative strategies; (ii) Prophet’s competitive Normalized Mean-Square Error (NMSE) values confirm its robustness in forecasting noisy data; (iii) ML approaches exhibit sensitivity to training data, with compound annual returns declining by up to 5.24% under alternative training periods, reflecting macroeconomic regime-switching effects; and (iv) while ML methods often produce higher absolute returns, they do not consistently yield improved risk-adjusted performance, with non-ML strategies sometimes matching or surpassing ML Sharpe Ratios (SR). Notably, HRP outperformed naïve RP in all scenarios, consistently delivering higher SR. Overall, while ML methods show strong potential, their effectiveness is contingent on data selection and regime stability—underscoring the need for robust scenario analyses such as the one presented.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113193"},"PeriodicalIF":7.2,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weizhi Ran, Sulemana Nantogma, Shangyan Zhang, Yang Xu
{"title":"Bio-inspired UAV swarm operation approach towards decentralized aerial electronic defense","authors":"Weizhi Ran, Sulemana Nantogma, Shangyan Zhang, Yang Xu","doi":"10.1016/j.asoc.2025.113136","DOIUrl":"10.1016/j.asoc.2025.113136","url":null,"abstract":"<div><div>The protection of critical assets and infrastructure from aerial attacks by adversaries is critical to national defense strategy. In this regard, autonomous aerial electronic defense with UAV swarm have the potential to distribute tasks and coordinate their operations to provide electronic countermeasures as an extra layer of defense. However, the challenge of decentralized coordination design of the swarm is a key bottleneck in UAV swarm electronic defense operations. This paper puts forward a decentralized honey bees-inspired multi-agent-based coordination design approach for UAV swarm in aerial electronic defense. The approach abstracts the coordination and planning of each UAV in the swarm as groups of agents with a hybrid hierarchical organization. Next, based on the behavior and operations of honey bees, a task planning model for coordination of the swarm is presented. Simulation results based on interception success rate, proportion of UAV working times, and interception path lengths show the proposed approach is capable of abstracting the swarm coordination problem and achieving a generally optimized aerial electronic defense. This approach shows promising results in designing a decentralized, responsive, and lightweight UAV swarm system capable of providing electronic countermeasures.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113136"},"PeriodicalIF":7.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acoustic backdoor attacks on speech recognition via frequency offset perturbation","authors":"Yu Tang , Xiaolong Xu , Lijuan Sun","doi":"10.1016/j.asoc.2025.113188","DOIUrl":"10.1016/j.asoc.2025.113188","url":null,"abstract":"<div><div>With the increasing deployment of deep learning-based speech recognition systems, backdoor attacks have become a serious security threat, enabling adversaries to implant hidden triggers that activate malicious behaviors while preserving model performance on benign inputs. However, existing acoustic backdoor attacks, whether in the time or frequency domain, often struggle to achieve sufficient stealthiness, as poisoned samples either disrupt semantic integrity or introduce perceptible artifacts. Moreover, these methods typically fail to strike an effective balance among attack efficacy, stealthiness, and robustness. To address these limitations, we propose Shadow Frequency (SF), a novel backdoor attack that leverages psychoacoustic-guided frequency offset perturbations to inject imperceptible yet model-sensitive signals near dominant spectral components. This design ensures auditory imperceptibility while maintaining high attack effectiveness and robustness. Experimental results show that SF achieves over 96% ASR with minimal impact on clean data accuracy, and remains effective under common defenses, validating its practicality for real-world deployment.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113188"},"PeriodicalIF":7.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Mao , Yuanqi Chang , Xuesong Yin , Binling Nie , Yigang Wang
{"title":"Optimized connections and feature interactions for more efficient single-image desnowing","authors":"Jiawei Mao , Yuanqi Chang , Xuesong Yin , Binling Nie , Yigang Wang","doi":"10.1016/j.asoc.2025.113153","DOIUrl":"10.1016/j.asoc.2025.113153","url":null,"abstract":"<div><div>The challenge of single image desnowing primarily stems from the diversity and irregular shape of snow. While existing methods can effectively remove snow particles of various shapes, they often introduce distortion to the restored images. To address the challenges posed by the diverse shapes and sizes of snow particles, as well as the issue of distortion after desnowing, we propose a novel single image desnowing network called Star-Net. Our approach designs a Star type Skip Connection (SSC), which establishes information channels for different scale features. This design allows the network to aggregate all scale features, making it easier to handle snow particles with complex shapes and varying sizes. Additionally, we design a Multi-Stage Interactive Transformer (MIT) as the foundational module of Star-Net to solve image distortion. MIT explicitly models a range of essential image recovery features (e.g., local features, multi-scale features) by combining the advantages of convolution and attention mechanisms to restore regions of image distortion and further enhance the comprehension of different snow particle shapes and sizes. Furthermore, through experimental observations, we identify the presence of snow particle residuals within the SSC. To address this, we propose a Degenerate Filter Module (DFM) that filters out snow particle residuals in the SSC across spatial and channel domains. Extensive experiments on standard snow removal datasets and real-world datasets demonstrate that Star-Net achieves state-of-the-art performance on snow removal tasks. Importantly, our approach retains the original sharpness of the images while effectively removing snow.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113153"},"PeriodicalIF":7.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zekun Tian , Dunlu Peng , Debby D. Wang , Linna Zhang , Zheng Zou , Hejing Huang , Shiqi Zhang
{"title":"A Swin Transformer based on multi-directional-shift window attention and inductive bias for diagnosis of pleural effusion","authors":"Zekun Tian , Dunlu Peng , Debby D. Wang , Linna Zhang , Zheng Zou , Hejing Huang , Shiqi Zhang","doi":"10.1016/j.asoc.2025.113146","DOIUrl":"10.1016/j.asoc.2025.113146","url":null,"abstract":"<div><div>In the field of healthcare, deep learning has shown promise in addressing diagnostic challenges. However, existing methods often struggle with generalization due to overfitting on non-discriminative features and limited datasets. To address these limitations, <em>Ultra-Multi-SWIN</em> is introduced as a novel deep learning model for pleural effusion diagnosis using ultrasound images. The model incorporates physician-inspired inductive biases into its architecture, enabling it to focus on discriminative features while avoiding overfitting to irrelevant information. Specifically, a multi-directional-shift window structure captures spatial features dependent on direction, and a MASK-based masking module suppresses redundant non-ultrasound features. A dataset comprising 50 subjects and four levels of pleural effusion severity (large, moderate, small, none) is established to evaluate the model’s performance. Experimental results demonstrate that <em>Ultra-Multi-SWIN</em> achieves state-of-the-art performance, with average accuracies of 0.988 (subject-dependent) and 0.952 (subject-independent). Visualization and ablation studies further confirm the model’s ability to generalize effectively by focusing on clinically relevant regions. The open-source code is released at <span><span>Ultra-Multi-SWIN</span><svg><path></path></svg></span>, promoting broader adoption and future research.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113146"},"PeriodicalIF":7.2,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}