Yajuan Zhang , Congcong Shen , Xia Jiang , Bo Qiu , Ali Luo , Fuji Ren , Yuanlu Chen
{"title":"A Multi-Scale Sparse Channel Transformer Network for image reconstruction of astronomical bright source contamination","authors":"Yajuan Zhang , Congcong Shen , Xia Jiang , Bo Qiu , Ali Luo , Fuji Ren , Yuanlu Chen","doi":"10.1016/j.engappai.2025.112119","DOIUrl":"10.1016/j.engappai.2025.112119","url":null,"abstract":"<div><div>Bright source contamination has long been a challenging issue in the field of image processing, particularly in applications such as astronomical observations, satellite imaging, and nighttime surveillance. To address this issue, this paper proposes a novel Multi-Scale Sparse Channel Transformer Network (MSCformer) aimed at achieving high-quality image reconstruction under the influence of bright source contamination. The network integrates a Top-k Sparse Attention mechanism with a Channel Attention module, enabling selective focus on the most informative features and adaptive weight allocation across channels. Additionally, a Multi-Scale Dual-Gate Feedforward Network is designed to further enhance the expression of valuable features while suppressing redundant information. Experimental results demonstrate that the proposed method exhibits outstanding performance in practical applications on the Sloan Digital Sky Survey (SDSS) photometric image dataset. Compared to existing state-of-the-art techniques, MSCformer achieves significant performance improvements, with a Peak Signal-to-Noise Ratio (PSNR) of 45.093 decibel(dB), a Structural Similarity Index Measure(SSIM) of 0.978, and a Pixel Average Absolute Error (PAAE) of 0.675. This not only significantly enhances the removal of bright source contamination in the field of astronomy but also provides important reference value for subsequent research in related domains.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112119"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zihao Wang , Le Xi , Yifan Ding , Wenjie Fang , Kaiming Wang , Hongliang Zuo
{"title":"A multi-view new energy vehicle form generation design method combining Kansei imagery and deep learning","authors":"Zihao Wang , Le Xi , Yifan Ding , Wenjie Fang , Kaiming Wang , Hongliang Zuo","doi":"10.1016/j.engappai.2025.112115","DOIUrl":"10.1016/j.engappai.2025.112115","url":null,"abstract":"<div><div>In the competitive landscape of new energy vehicles, exterior design has become a crucial differentiator amid functional homogenization. User preferences are central to shaping vehicle appearance, yet most perceptual design methods rely on a single viewpoint, limiting insights into complex preference patterns. This study proposes a multi-perspective mapping approach that integrates Kansei engineering with deep learning.</div><div>Firstly, user core imagery is collected and mined through big data. Secondly, Kernels Network (KNet) semantic segmentation model, Residual Networks (ResNet) tri-view (front/side/rear) score prediction model and fully connected network (FCN) feature fusion model are integrated to construct a multi-view feature mapping system. Finally, the optimal combination of morphological elements is explored based on the Elite Genetic Algorithm (EGA), and the scheme is validated through generative artificial intelligence (AI) workflow.</div><div>The experimental results demonstrate that, employing “Cool” as a case study, the three-view scheme and the combination scheme devised by this research process exhibit substantial superiority over the majority of the samples. Under identical parameters, the scheme with decision constraints surpasses the randomly generated scheme in terms of perceptual scores and stability. The performance of the test set and the experimental results collectively substantiate the model’s validity.</div><div>This workflow—covering preference extraction, morphological decomposition, AI-driven generation, and validation—provides a scalable framework for new energy vehicle exterior design. It also demonstrates novel applications of Kansei engineering in multi-view fusion and generative form design.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112115"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Key region Semantic information Augmented Transformer for Image Captioning","authors":"Fuyun Deng, Wei Li, Zhixin Li","doi":"10.1016/j.engappai.2025.112135","DOIUrl":"10.1016/j.engappai.2025.112135","url":null,"abstract":"<div><div>Existing image captioning models often face difficulties in capturing inter-object relationships and generating description that comprehensively understands the entire image content, either relying on object detectors that overlook contextual information or depending on grid features that fail to adequately model spatial interactions. This paper proposes two solutions to these challenges. The first is the introduction of a module for mining semantic information from key regions. Based on the spatial proximity and high co-occurrence between objects, this module identifies the public region covered by these objects as a key region, mines their semantic information, and incorporates it into the modeling process, which compensates for the limitations of grid features. Second, we improve the standard Transformer decoder’s architecture by innovatively introducing an adaptive gating mechanism that dynamically adjusts the alignment between textual and visual features, enhancing the model’s overall comprehension of the image. To validate our approach, we applied these modules to the Transformer framework and proposed a novel method for image captioning, called Key region Semantic information Augmented Transformer (KSAT) for Image Captioning. Extensive experiments on benchmark datasets show that the proposed method outperforms many models. Specifically, our method achieves a score of 139.6% on the offline test, and 138.4% on the official online test server on the Consensus-based Image Description Evaluation (CIDEr) metric. In qualitative evaluation, our method also outperforms other methods at generating captions for complex scenes. Overall, these results confirm the validity of our method and advance the field of artificial intelligence.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112135"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shangquan Sun, Wenqi Ren, Jingyang Peng, Fenglong Song, Xiaochun Cao
{"title":"DI-Retinex: Digital-Imaging Retinex Model for Low-Light Image Enhancement","authors":"Shangquan Sun, Wenqi Ren, Jingyang Peng, Fenglong Song, Xiaochun Cao","doi":"10.1007/s11263-025-02542-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02542-z","url":null,"abstract":"<p>Many existing methods for low-light image enhancement (LLIE) based on Retinex model ignore important factors that affect the validity of this model in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex model (DI-Retinex) through theoretical and experimental analysis of Retinex model in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the low-light enhancement problem in an unsupervised manner, we propose an image-adaptive masked degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods. We have released our code and model weights on https://github.com/sunshangquan/Di-Retinex.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"42 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145007129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single-channel speech denoising by masking the colored spectrograms","authors":"Sania Gul , Muhammad Salman Khan","doi":"10.1016/j.compeleceng.2025.110656","DOIUrl":"10.1016/j.compeleceng.2025.110656","url":null,"abstract":"<div><div>Speech denoising (SD) covers the algorithms that remove the background noise from the target speech and thus improve its quality and intelligibility. In this paper, a novel SD technique is proposed that masks the colored spectrogram. U-Net (a deep neural network fundamentally developed for image segmentation) is trained on the noisy log-powered colored spectrograms (LPcS), using the binarized Mel spectrograms as ground truth (GT). After training, the colored spectrogram of the noisy speech is passed through U-Net, which generates a soft mask at its output. This mask is applied to the magnitude matrix of the short-time Fourier transform (STFT) of the noisy speech to retrieve the magnitude matrix of the estimated speech. This matrix is later combined with the noisy phase matrix to recover the target speech. The results show that with masking-based targets, the colored spectrograms provide an improvement of 0.12 points in perceptual evaluation of speech quality (PESQ) score, 4 % in short time objective intelligibility (STOI), and a 163 times reduction in network learnable parameters, as compared to when they are processed by a mapping-based model using pix2pix generative adversarial network (GAN) followed by a feedforward regression neural network. With a slightly reduced PESQ score (by 0.58 points), the proposed model offers an improvement of 2 % in STOI, and 4375 and 1135 times reduction respectively in the required number of training epochs and network parameters when compared to a GAN-based model augmented by WavLM; a large-scale self-supervised learning model. Similarly, it offers an improvement of 1 % in STOI and a reduction of 33 and 200 times, respectively, in network size and training epochs when compared to a complex variational U-Net-based model. Also, with comparable PESQ, the proposed system offers almost 2 % improvement in STOI, and a 2 times reduction in network size and 100 times reduction in training epochs, when compared to a lightweight system using automatic dimension reduction of network layers by a structured pruning method.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"128 ","pages":"Article 110656"},"PeriodicalIF":4.9,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Qiao , Pei Zhao , Junjie Wang , Rongyao Hu , Minyue Li , Xinyu Yuan , Meng Li , Zhenchun Wei , Cuiying Feng
{"title":"Multivariate Time Series forecasting based on temporal decomposition and graph neural network","authors":"Yan Qiao , Pei Zhao , Junjie Wang , Rongyao Hu , Minyue Li , Xinyu Yuan , Meng Li , Zhenchun Wei , Cuiying Feng","doi":"10.1016/j.engappai.2025.112074","DOIUrl":"10.1016/j.engappai.2025.112074","url":null,"abstract":"<div><div>It is quite challenging to forecast the Multivariate Time Series (MTS) accurately due to the high dimensionality of MTS and the entangled correlation between variables. Recently, graph-based networks have been demonstrated to be an effective model to handle the complex correlations between MTS. However, all existing graph-based methods construct the graph model of the MTS using only the shallow correlations from the raw MTS data, ignoring the deep-rooted correlations hidden in the features. In this paper, we propose for the first time to construct a comprehensive graph model of MTS that incorporates both shallow correlations from raw data and hidden correlations from decomposed temporal properties. Then, we propose a novel graph-based MTS forecasting framework, which optimizes the graph structure jointly with the model parameters. By doing so, the graph structure can adaptively model the correlations of MTS at a deep level, while the joint optimization can make the constructed graph compatible with the forecasting tasks of MTS, contributing to a globally optimal solution. Finally, we conduct extensive experiments on seven real-world datasets, the results demonstrate the superiority of our method on MTS forecasting over the state-of-the-art baselines. The source codes of the experiments with datasets are available at <span><span>https://github.com/ironweng/MF-TDGNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112074"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal-frequency joint hierarchical transformer with dynamic windows for speech emotion recognition","authors":"Yonghong Fan , Heming Huang , Huiyun Zhang , Ziqi Zhou","doi":"10.1016/j.engappai.2025.112152","DOIUrl":"10.1016/j.engappai.2025.112152","url":null,"abstract":"<div><div>Speech Emotion Recognition (SER) aims to identify the emotional state of a speaker from speech signals, serving as a critical prerequisite for achieving natural human–computer interaction. In speech signals, emotional information is inherently distributed across diverse frequency bands and temporal scales, with emotional cues in distinct regions exhibiting varying levels of heterogeneity or interdependence. Existing Transformer-based methods face limitations in precisely localizing salient temporal-frequency regions and modeling their inter-regional relationships. To address these challenges, a temporal-frequency joint hierarchical Transformer with dynamic window mechanisms, abbreviated as TF-DWFormer, is proposed to capture critical emotional cues and their contextual dependencies across temporal-frequency dimensions. It operates through several principal phases: Firstly, a feature reconstruction module is designed to extract temporal, frequency, and temporal-frequency representations of emotional speech. Secondly, a high-low frequency-based emotion-aware partitioning strategy is designed to achieve the division of emotional regions. Thirdly, a local window within a hierarchical Transformer analyzes static intra-region correlations to capture fine-grained emotional patterns, while a dynamic window adaptively models temporal evolution across regions, learning dynamic inter-region relationships. Lastly, a dual-cross-attention mechanism is employed to synergize comprehensive emotion representation from different domains. Our evaluation experiments demonstrate that the proposed TF-DWFormer method achieves recognition accuracies of 73.68%, 91.67%, 92.59%, 74.42%, and 50.54% on the datasets IEMOCAP, CASIA, EMODB, eNTERFACE05, and MELD, respectively, outperforming existing SER methods. These results confirm the capability of TF-DWFormer to precisely localize salient regions, robustly model inter-region dependencies, and effectively fuse multi-domain information, providing a promising solution for advancing SER technology.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112152"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenguang Chen , Long Liang , Junhong Ye , Lingfei Liu , Neven Ukrainczyk , Liqiang Yin , Jiangtao Yu , Kequan Yu
{"title":"Machine learning-enabled performance-based design of three-dimensional printed engineered cementitious composites","authors":"Wenguang Chen , Long Liang , Junhong Ye , Lingfei Liu , Neven Ukrainczyk , Liqiang Yin , Jiangtao Yu , Kequan Yu","doi":"10.1016/j.engappai.2025.112117","DOIUrl":"10.1016/j.engappai.2025.112117","url":null,"abstract":"<div><div>The superior tensile ductility of engineered cementitious composites (ECC) offers a promising solution to the challenge of integrating conventional steel reinforcement in three-dimensional (3D) concrete printing (3DCP). However, the widespread adoption of 3D printed ECC (3DP-ECC) is hindered by the reliance on trial-and-error design process. The complex material component and inherent anisotropy of 3DP-ECC pose challenges for accurate property prediction and inverse design. This paper introduces a performance-based design strategy for 3DP-ECC, leveraging machine learning (ML) and multi-objective optimization. The anisotropic-mechanical properties including compressive strength and flexural strength were experimentally and statistically investigated; further, ML prediction models conbined with multi-objective optimization algorithm were developed to inversely design 3DP-ECC for specific mechanical performance requirements, while reducing carbon footprint and material cost. Specifically, an extensive database was assembled, followed by grey relational analysis (GRA) to identify the parametric sensitivity of the mechanical properties of 3DP-ECC. Three representative ML techniques were employed, with the back-propagation artificial neural network (BPANN) demonstrating superior predictive accuracy. Model interpretability analyses uncovered the importance of input parameters and their influence on predicted outcomes. Lastly, non-dominated Sorting Genetic Algorithm II (NSGA-II) integrated with the BPANN models was applied to perform the inverse design of 3DP-ECC, showing good effectiveness and accuracy. This work offers an efficient and viable avenue for performance-based design for 3DP-ECC, along with the potential to develop low-carbon cost-effective 3DP-ECC.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112117"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coarse-to-fine dual-branch network for ship target recognition in complex environments","authors":"Yang Tian, Hao Meng","doi":"10.1016/j.engappai.2025.112120","DOIUrl":"10.1016/j.engappai.2025.112120","url":null,"abstract":"<div><div>Harsh sea conditions and the complex and variable positions of ships significantly impact the capacity of imaging devices to capture high-quality ship images, making ship target recognition challenging in the application of artificial intelligence. Many scholars have recently proposed cascaded recognition models to address this issue. Following this method, in this paper, we propose a novel method for ship target recognition in complex environments called the coarse-to-fine dual-branch (CFDB) network. The CFDB model designs a dual-branch network from coarse to fine to lock the target area fine features and then uses peer-to-peer communication to extract and exchange learning of the target region’s final discriminative contour features, assisting in predicting ship classes in the complex environment. The proposed method is evaluated on the constructed complex in background ships (CIB-ships) dataset and the publicly available Marine Argos Recognition Ships (MAR-ships) and Game-of-Ships datasets. Compared with the suboptimal method, the proposed CFDB network exhibits improvements of 2.11%, 1.33%, and 1.24% accuracy on the CIB-ships, MAR-ships, and Game-of-ships datasets, respectively. The results demonstrate that the proposed method provides useful ideas for the dynamic monitoring of ships in real environments. Our code will be published at <span><span>https://github.com/yangt1013/CFDB-master</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112120"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep reinforcement learning-based strategic bidding in electricity markets via variational autoencoder-assisted competitor behavior learning","authors":"Fei Hu, Yong Zhao, Yaowen Yu, Yuanzheng Li","doi":"10.1016/j.engappai.2025.112205","DOIUrl":"10.1016/j.engappai.2025.112205","url":null,"abstract":"<div><div>In a deregulated electricity market, self-interested producers have incentives to offer strategically for maximizing their own profits. While deep reinforcement learning (DRL) has shown great potential for solving such strategic bidding problems, existing methods typically oversimplify strategic action spaces and neglect the influence of competitors' offering behaviors. To bridge these gaps, this paper proposes a novel DRL-based framework to model and solve the strategic bidding problem of an individual producer by jointly considering price-quantity offering actions and the dynamic behaviors of market competitors. First, a bilevel optimization model is formulated to incorporate offering actions on price-quantity pairs. Then, a data-driven framework that combines a variational autoencoder with a density-based clustering method is proposed to learn and capture competitors' offering behaviors. Finally, an imitation learning-integrated DRL algorithm is developed to improve learning stability and solution quality for strategic bidding with price-quantity actions and competitors' offering behaviors. Case studies on the IEEE-30 bus system show that the proposed framework obtains a 28.12 k$ (24.25 %) increase in average profit compared to the existing approach, demonstrating its effectiveness and adaptability under dynamic market conditions.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"161 ","pages":"Article 112205"},"PeriodicalIF":8.0,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}