Jin Meng, Jiahui Zou, Zhuoheng Xiang, Cui Wang, Shifeng Wang, Yan Li, Jonghyuk Kim
{"title":"Visible and thermal image fusion network with diffusion models for high-level visual tasks","authors":"Jin Meng, Jiahui Zou, Zhuoheng Xiang, Cui Wang, Shifeng Wang, Yan Li, Jonghyuk Kim","doi":"10.1007/s10489-024-06210-6","DOIUrl":"10.1007/s10489-024-06210-6","url":null,"abstract":"<div><p>Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive spiking neuron with population coding for a residual spiking neural network","authors":"Yongping Dan, Changhao Sun, Hengyi Li, Lin Meng","doi":"10.1007/s10489-024-06128-z","DOIUrl":"10.1007/s10489-024-06128-z","url":null,"abstract":"<div><p>Spiking neural networks (SNNs) have attracted significant research attention due to their inherent sparsity and event-driven processing capabilities. Recent studies indicate that the incorporation of convolutional and residual structures into SNNs can substantially enhance performance. However, these converted spiking residual structures are associated with increased complexity and stacked parameterized spiking neurons. To address this challenge, this paper proposes a meticulously refined two-layer decision structure for residual-based SNNs, consisting solely of fully connected and spiking neuron layers. Specifically, the spiking neuron layers incorporate an innovative dynamic leaky integrate-and-fire (DLIF) neuron model with a nonlinear self-feedback mechanism, characterized by dynamic threshold adjustment and a self-regulating firing rate. Furthermore, diverging from traditional direct encoding, which focuses solely on individual neuronal frequency, we introduce a novel mixed coding mechanism that combines direct encoding with multineuronal population decoding. The proposed architecture improves the adaptability and responsiveness of spiking neurons in various computational contexts. Experimental results demonstrate the superior efficacy of our approach. Although it uses a highly simplified structure with only 6 timesteps, our proposal achieves enhanced performance in the experimental trials compared to multiple state-of-the-art methods. Specifically, it achieves accuracy improvements of 0.01-1.99% on three static datasets and of 0.14-7.50% on three N-datasets. The DLIF model excels in information processing, showing double mutual information compared to other neurons. In the sequential MNIST dataset, it balances biological realism and practicality, enhancing memory and the dynamic range. Our proposed method not only offers improved computational efficacy and simplified network structure but also enhances the biological plausibility of SNN models and can be easily adapted to other deep SNNs.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiuyun Zhang, Qiumei Guo, Hong Jiang, Xinfan Yin, Muhammad Umer Mushtaq, Ying Luo, Chun Wu
{"title":"EMD empowered neural network for predicting spatio-temporal non-stationary channel in UAV communications","authors":"Qiuyun Zhang, Qiumei Guo, Hong Jiang, Xinfan Yin, Muhammad Umer Mushtaq, Ying Luo, Chun Wu","doi":"10.1007/s10489-024-06165-8","DOIUrl":"10.1007/s10489-024-06165-8","url":null,"abstract":"<div><p>This paper introduces a novel prediction method for spatio-temporal non-stationary channels between unmanned aerial vehicles (UAVs) and ground control vehicles, essential for the fast and accurate acquisition of channel state information (CSI) to support UAV applications in ultra-reliable and low-latency communication (URLLC). Specifically, an empirical mode decomposition (EMD)-empowered spatio-temporal attention neural network is proposed, referred to as EMD-STANN. The STANN sub-module within EMD-STANN is designed to capture the spatial correlation and temporal dependence of CSI. Furthermore, the EMD component is employed to handle the non-stationary and nonlinear dynamic characteristics of the UAV-to-ground control vehicle (U2V) channel, thereby enhancing the feature extraction and refinement capabilities of the STANN and improving the accuracy of CSI prediction. Additionally, we conducted a validation of the proposed EMD-STANN model across multiple datasets. The results indicated that EMD-STANN is capable of effectively adapting to diverse channel conditions and accurately predicting channel states. Compared to existing methods, EMD-STANN exhibited superior predictive performance, as indicated by its reduced root mean square error (RMSE) and mean absolute error (MAE) metrics. Specifically, EMD-STANN achieved a reduction of 24.66% in RMSE and 25.46% in MAE compared to the reference method under our simulation conditions. This improvement in prediction accuracy provides a solid foundation for the implementation of URLLC applications.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bai Dingyuan, Guo Baoqing, Ruan Tao, Zhou Xingfang, Sun Tao, Wang Yu, Liu Tao
{"title":"F2RAIL: panoptic segmentation integrating Fpn and transFormer towards RAILway","authors":"Bai Dingyuan, Guo Baoqing, Ruan Tao, Zhou Xingfang, Sun Tao, Wang Yu, Liu Tao","doi":"10.1007/s10489-024-06158-7","DOIUrl":"10.1007/s10489-024-06158-7","url":null,"abstract":"<div><p>Panoptic segmentation method enables precise identification and localization of various elements in railway scenes by assigning unique masks to each object in the image, thereby providing crucial data support for autonomous perception tasks in railway environments. However, existing segmentation methods fail to effectively leverage the prominent boundary and linear features of objects such as railway tracks and guardrails, resulting in unsatisfactory segmentation performance in railway scenes. Moreover, the inherent structural limitations of generic segmentation methods lead to weak feature extraction capabilities. Accordingly, this paper proposes the F2RAIL panoptic segmentation network, which achieves a unified approach to multi-scale detection and high-precision recognition through an innovative fusion of Feature Pyramid Networks (FPN) and transformer networks. By introducing an edge feature enhancement module, we address the insufficient utilization of linear features in railway scenes by segmentation models; By introducing a multi-dimensional enhancement module, we resolve the issues of weakened or even lost deep feature information in segmentation models. Based on the aforementioned structural innovations and methodological improvements, F2RAIL achieved a panoptic quality(PQ) of 43.74% on our custom railway dataset, representing a 2.2% improvement over existing state-of-the-art(SOTA) methods. Additionally, it demonstrated comparable performance to SOTA methods on public benchmark datasets.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunyang Tang, Zhonglin Ye, Haixing Zhao, Libing Bai, Jingjing Lin
{"title":"DeepSCNN: a simplicial convolutional neural network for deep learning","authors":"Chunyang Tang, Zhonglin Ye, Haixing Zhao, Libing Bai, Jingjing Lin","doi":"10.1007/s10489-024-06121-6","DOIUrl":"10.1007/s10489-024-06121-6","url":null,"abstract":"<div><p>Graph convolutional neural networks (GCNs) are deep learning methods for processing graph-structured data. Usually, GCNs mainly consider pairwise connections and ignore higher-order interactions between nodes. Recently, simplices have been shown to encode not only pairwise relations between nodes but also encode higher-order interactions between nodes. Researchers have been concerned with how to design simplicial-based convolutional neural networks. The existing simplicial neural networks can achieve good performance in tasks such as missing value imputation, graph classification, and node classification. However, due to issues of gradient vanishing, over-smoothing, and over-fitting, they are typically limited to very shallow models. Therefore, we innovatively propose a simplicial convolutional neural network for deep learning (DeepSCNN). Firstly, simplicial edge sampling technology (SES) is introduced to prevent over-fitting caused by deepening network layers. Subsequently, initial residual connection technology is added to simplicial convolutional layers. Finally, to verify the validity of the DeepSCNN, we conduct missing data imputation and node classification experiments on citation networks. Additionally, we compare the experimental performance of the DeepSCNN with that of simplicial neural networks (SNN) and simplicial convolutional networks (SCNN). The results show that our proposed DeepSCNN method outperforms SNN and SCNN.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Azise Oumar Diallo, Guillaume Lozenguez, Arnaud Doniec, René Mandiau
{"title":"Utility-based agent model for intermodal behaviors: a case study for urban toll in Lille","authors":"Azise Oumar Diallo, Guillaume Lozenguez, Arnaud Doniec, René Mandiau","doi":"10.1007/s10489-024-05869-1","DOIUrl":"10.1007/s10489-024-05869-1","url":null,"abstract":"<div><p>To reduce the congestion and pollution in urban cities, the political authorities encourage the modal shift from private cars in favor of sustainable trip behaviors such as intermodality (through combinations of private cars and public transport). Coercive decisions such as urban tolls are also an increasingly investigated solution. To avoid the cost of toll taxes, agents thus select intermodal transportation modes (private cars and public transport) by parking their vehicles in park-and-ride (<i>PR</i>) facilities at the entrance to the area toll. This paper proposes a methodology for an agent-based model (<i>ABM</i>), particularly a model called <i>utility-based agent</i>, to reproduce intermodal trip behaviors in a city and to assess the impact of an urban toll. In this context, we focus on multinomial logit models, coupled with the agent-and-activity simulation tool <i>MATSim</i>, is used to determine the modal choice for each agent. Based on open data (for <i>European Metropolis of Lille</i>, <i>MEL</i>), the simulation shows that <span>(varvec{20})</span> € (<span>(varvec{21.75})</span> $) of toll tax is sufficient to reduce by <span>(varvec{20}%)</span> the use of private vehicles.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing few-shot learning using targeted mixup","authors":"Yaw Darkwah Jnr., Dae-Ki Kang","doi":"10.1007/s10489-024-06157-8","DOIUrl":"10.1007/s10489-024-06157-8","url":null,"abstract":"<div><p>Irrespective of the attention that long-tailed classification has received over recent years, expectedly, the performance of the tail classes suffers more than the remaining classes. We address this problem by means of a novel data augmentation technique called Targeted Mixup. This is about mixing class samples based on the model’s performance regarding each class. Instances of classes that are difficult to distinguish are randomly chosen and linearly interpolated to produce a new sample such that the model can pay attention to those two classes. The expectation is that the model can learn the distinguishing features to improve classification of instances belonging to their respective classes. To prove the efficiency of our proposed methods empirically, we performed experiments using CIFAR-100-LT, Places-LT, and Speech Commands-LT datasets. From the results of the experiments, there was an improvement on the few-shot classes without sacrificing too much of the model performance on the many-shot and medium-shot classes. In fact, there was an increase in the overall accuracy as well.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ha Thi Minh Phuong, Pham Vu Thu Nguyet, Nguyen Huu Nhat Minh, Le Thi My Hanh, Nguyen Thanh Binh
{"title":"A comparative study of handling imbalanced data using generative adversarial networks for machine learning based software fault prediction","authors":"Ha Thi Minh Phuong, Pham Vu Thu Nguyet, Nguyen Huu Nhat Minh, Le Thi My Hanh, Nguyen Thanh Binh","doi":"10.1007/s10489-024-05930-z","DOIUrl":"10.1007/s10489-024-05930-z","url":null,"abstract":"<p>Software fault prediction (SFP) is the process of identifying potentially defect-prone modules before the testing stage of a software development process. By identifying faults early in the development process, software engineers can spend their efforts on those components most likely to contain defects, thereby improving the overall quality and reliability of the software. However, data imbalance and feature redundancy are challenging issues in SFP that can negatively impact the performance of fault prediction models. Imbalanced software fault datasets, in which the number of normal modules (majority class) is significantly higher than that of faulty modules (minority class), may lead to many false negative results. In this work, we study and perform an empirical assessment of the variants of Generative Adversarial Networks (GANs), an emerging synthetic data generation method, for resolving the data imbalance issue in common software fault prediction datasets. Five GANs variations - CopulaGAN, VanillaGAN, CTGAN, TGAN and WGANGP are utilized to generate synthetic faulty samples to balance the proportion of the majority and minority classes in datasets. Thereafter, we present an extensive evaluation of the performance of different prediction models which involve combining Recursive Feature Elimination (RFE) for feature selection with GANs oversampling methods, along with pairs of Autoencoders for feature extraction with GANs models. Throughout the experiments with five fault datasets extracted from the PROMISE repository, we evaluate six different machine learning approaches using precision, recall, F1-score, Area Under Curve (AUC) and Matthews Correlation Coefficient (MCC) as performance evaluation metrics. The experimental results demonstrate that the combination of CTGAN with RFE and a pair of CTGAN with Autoencoders outperform other baselines for all datasets, followed by WGANGP and VanillaGAN. According to the comparative analysis, GANs-based oversampling methods exhibited significant improvement in dealing with data imbalance for software fault prediction.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced causal effects estimation based on offline reinforcement learning","authors":"Huan Xia, Chaozhe Jiang, Chenyang Zhang","doi":"10.1007/s10489-024-06009-5","DOIUrl":"10.1007/s10489-024-06009-5","url":null,"abstract":"<div><p>Causal effects estimation is essential for analyzing the causal effects of treatment (intervention) on outcome, but traditional methods often rely on the strong assumption of no unobserved confounding factors. We propose ECEE-RL (Enhanced Causal Effects Estimation based on Reinforcement Learning), a novel architecture that leverages offline reinforcement learning to relax this assumption. ECEE-RL innovatively models causal effects estimation as a stateless Markov Decision Process, allowing for adaptive policy optimization through action-reward combinations. By framing estimation as \"actions\" and sensitivity analysis results as \"rewards\", ECEE-RL minimizes sensitivity to confounders, including unobserved ones. Theoretical analysis confirms the convergence and robustness of ECEE-RL. Experiments on the two simulated datasets demonstrate significant improvements, with CATE MSE reductions ranging from 5.45% to 66.55% and sensitivity significance reductions of up to 98.29% compared to baseline methods. These results corroborate our theoretical findings on ECEE-RL's improved accuracy and robustness. Application to real-world pilot-aircraft interaction data reveals significant causal effects of control behaviors on bioelectrical signals and emotions, demonstrating ECEE-RL's practical utility. While computationally intensive, ECEE-RL offers a promising approach for causal effects estimation, particularly in scenarios where unobserved confounding may be present, representing an important step towards more reliable causal inference in complex real-world settings.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel embedded cross framework for high-resolution salient object detection","authors":"Baoyu Wang, Mao Yang, Pingping Cao, Yan Liu","doi":"10.1007/s10489-024-06073-x","DOIUrl":"10.1007/s10489-024-06073-x","url":null,"abstract":"<div><p>Salient object detection (SOD) is a fundamental research topic in computer vision and has attracted significant interest from various fields, it has revealed two issues while driving the rapid development of salient detection. (1) The salient regions in high-resolution images exhibit significant differences in location, structure, and edge details, which makes them difficult to recognize and depict. (2) The traditional salient detection architecture is insensitive to detecting targets in high-resolution feature spaces, which leads to incomplete saliency predictions. To address these limitations, this paper proposes a novel embedded cross framework with a dual-path transformer (ECF-DT) for high-resolution SOD. The framework consists of a dual-path transformer and a unit fusion module for partitioning the salient targets. Specifically, we first design a cross network as a baseline model for salient object detection. Then, the dual-path transformer is embedded into the cross network with the objective of integrating fine-grained visual contextual information and target details while suppressing the disparity of the feature space. To generate more robust feature representations, we also introduce a unit fusion module, which highlights the positive information in the feature channels and encourages saliency prediction. Extensive experiments are conducted on nine benchmark databases, and the performance of the ECF-DT is compared with that of other existing state-of-the-art methods. The results indicate that our method outperforms its competitors and accurately detects the targets in high-resolution images with large objects, cluttered backgrounds, and complex scenes. It achieves MAEs of 0.017, 0.026, and 0.031 on three high-resolution public databases. Moreover, it reaches S-measure rates of 0.909, 0.876, 0.936, 0.854, 0.929, and 0.826 on six low-resolution public databases.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}