Neural NetworksPub Date : 2025-03-10DOI: 10.1016/j.neunet.2025.107320
Sen Wang , Dongliang Zhou , Liang Xie , Chao Xu , Ye Yan , Erwei Yin
{"title":"PanoGen++: Domain-adapted text-guided panoramic environment generation for vision-and-language navigation","authors":"Sen Wang , Dongliang Zhou , Liang Xie , Chao Xu , Ye Yan , Erwei Yin","doi":"10.1016/j.neunet.2025.107320","DOIUrl":"10.1016/j.neunet.2025.107320","url":null,"abstract":"<div><div>Vision-and-language navigation (VLN) tasks require agents to navigate three-dimensional environments guided by natural language instructions, offering substantial potential for diverse applications. However, the scarcity of training data impedes progress in this field. This paper introduces PanoGen++, a novel framework that addresses this limitation by generating varied and pertinent panoramic environments for VLN tasks. PanoGen++ incorporates pre-trained diffusion models with domain-specific fine-tuning, employing parameter-efficient techniques such as low-rank adaptation to minimize computational costs. We investigate two settings for environment generation: masked image inpainting and recursive image outpainting. The former maximizes novel environment creation by inpainting masked regions based on textual descriptions, while the latter facilitates agents’ learning of spatial relationships within panoramas. Empirical evaluations on room-to-room (R2R), room-for-room (R4R), and cooperative vision-and-dialog navigation (CVDN) datasets reveal significant performance enhancements: a 2.44% increase in success rate on the R2R test leaderboard, a 0.63% improvement on the R4R validation unseen set, and a 0.75-meter enhancement in goal progress on the CVDN validation unseen set. PanoGen++ augments the diversity and relevance of training environments, resulting in improved generalization and efficacy in VLN tasks.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107320"},"PeriodicalIF":6.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-03-10DOI: 10.1016/j.neunet.2025.107337
Tengfei Gao , Dan Chen , Meiqi Zhou , Yaodong Wang , Yiping Zuo , Weiping Tu , Xiaoli Li , Jingying Chen
{"title":"Self-training EEG discrimination model with weakly supervised sample construction: An age-based perspective on ASD evaluation","authors":"Tengfei Gao , Dan Chen , Meiqi Zhou , Yaodong Wang , Yiping Zuo , Weiping Tu , Xiaoli Li , Jingying Chen","doi":"10.1016/j.neunet.2025.107337","DOIUrl":"10.1016/j.neunet.2025.107337","url":null,"abstract":"<div><div>Deep learning for Electroencephalography (EEG) has become dominant in the tasks of discrimination and evaluation of brain disorders. However, despite its significant successes, this approach has long been facing challenges due to the limited availability of labeled samples and the individuality of subjects, particularly in complex scenarios such as Autism Spectrum Disorders (ASD). To facilitate the efficient optimization of EEG discrimination models in the face of these limitations, this study has developed a framework called STEM (Self-Training EEG Model). STEM accomplishes this by self-training the model, which involves initializing it with limited labeled samples and optimizing it with self-constructed samples. (1) <em>Model initialization with multi-task learning:</em> A multi-task model (MAC) comprising an AutoEncoder and a classifier offers guidance for subsequent pseudo-labeling. This guidance includes task-related latent EEG representations and prediction probabilities of unlabeled samples. The AutoEncoder, which consists of depth-separable convolutions and BiGRUs, is responsible for learning comprehensive EEG representations through the EEG reconstruction task. Meanwhile, the classifier, trained using limited labeled samples through supervised learning, directs the model’s attention towards capturing task-related features. (2) <em>Model optimization aided by pseudo-labeled samples construction:</em> Next, trustworthy pseudo-labels are assigned to the unlabeled samples, and this approach (PLASC) combines the sample’s distance relationship in the feature space mapped by the encoder with the sample’s predicted probability, using the initial MAC model as a reference. The constructed pseudo-labeled samples then support the self-training of MAC to learn individual information from new subjects, potentially enhancing the adaptation of the optimized model to samples from new subjects. The STEM framework has undergone an extensive evaluation, comparing it to state-of-the-art counterparts, using resting-state EEG data collected from 175 ASD-suspicious children spanning different age groups. The observed results indicate the following: (1) STEM achieves the best performance, with an accuracy of 88.33% and an F1-score of 87.24%, and (2) STEM’s multi-task learning capability outperforms supervised methods when labeled data is limited. More importantly, the use of PLASC improves the model’s performance in ASD discrimination across different age groups, resulting in an increase in accuracy (3%–8%) and F1-scores (4%–10%). These increments are approximately 6% higher than those achieved by the comparison methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107337"},"PeriodicalIF":6.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards zero-shot human–object interaction detection via vision–language integration","authors":"Weiying Xue, Qi Liu, Yuxiao Wang, Zhenao Wei, Xiaofen Xing, Xiangmin Xu","doi":"10.1016/j.neunet.2025.107348","DOIUrl":"10.1016/j.neunet.2025.107348","url":null,"abstract":"<div><div>Human–object interaction (HOI) detection aims to locate human–object pairs and identify their interaction categories in images. Most existing methods primarily focus on supervised learning, which relies on extensive manual HOI annotations. Such heavy reliance on closed-set supervised learning limits their generalization capabilities to unseen object categories. Inspired by the remarkable zero-shot capabilities of VLM, we propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of the visual–language model to improve zero-shot HOI detection. Specifically, we propose a ho-pair encoder to supplement contextual and interaction-specific semantic representation decoder into our model. Additionally, we propose two fusion strategies to facilitate prior knowledge transfer of VLM. One is visual-level fusion, producing more global context interaction features; another is language-level fusion, further enhancing the capability of VLM for HOI detection. Extensive experiments conducted on the mainstream HICO-DET and V-COCO datasets demonstrate that our model outperforms the previous methods in various zero-shot and full-supervised settings. The source code is available in <span><span>https://github.com/xwyscut/K2HOI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107348"},"PeriodicalIF":6.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-03-10DOI: 10.1016/j.neunet.2025.107341
Hegui Zhu , Yanmeng Jia , Yue Yan , Ze Yang
{"title":"Improving transferability of adversarial examples via statistical attribution-based attacks","authors":"Hegui Zhu , Yanmeng Jia , Yue Yan , Ze Yang","doi":"10.1016/j.neunet.2025.107341","DOIUrl":"10.1016/j.neunet.2025.107341","url":null,"abstract":"<div><div>Adversarial attacks are significant in uncovering vulnerabilities and assessing the robustness of deep neural networks (DNNs), offering profound insights into their internal mechanisms. Feature-level attacks, a potent approach, craft adversarial examples by extensively corrupting the intermediate-layer features of the source model during each iteration. However, it often has imprecise metrics to assess the significance of features and may impose constraints on the transferability of adversarial examples. To address these issues, this paper introduces the Statistical Attribution-based Attack (SAA) method, which emphasizes finding feature importance representations and refining optimization objectives, thereby achieving stronger attack performance. To calculate the Comprehensive Gradient for more accurate feature representation, we introduce the Region-wise Feature Disturbance and Gradient Information Aggregation, which can effectively disrupt the model’s attention focus areas. Subsequently, a statistical attribution-based approach is employed, leveraging the average feature information across layers to provide a more advantageous optimization objective. Experiments have validated the superiority of this method. Specifically, SAA improves the attack success rate by 9.3% compared with the second-best method. When combined with input transformation methods, it achieves an average success rate of 79.2% against eight leading defense models.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107341"},"PeriodicalIF":6.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FingerPoseNet: A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation","authors":"Tekie Tsegay Tewolde , Ali Asghar Manjotho , Prodip Kumar Sarker , Zhendong Niu","doi":"10.1016/j.neunet.2025.107315","DOIUrl":"10.1016/j.neunet.2025.107315","url":null,"abstract":"<div><div>Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107315"},"PeriodicalIF":6.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-03-10DOI: 10.1016/j.neunet.2025.107299
Jinsheng Xiao , Jingyi Wu , Shurui Wang , Qiuze Yu , Honggang Xie , Yuan-Fang Wang
{"title":"Probabilistic memory auto-encoding network for abnormal behavior detection in surveillance video","authors":"Jinsheng Xiao , Jingyi Wu , Shurui Wang , Qiuze Yu , Honggang Xie , Yuan-Fang Wang","doi":"10.1016/j.neunet.2025.107299","DOIUrl":"10.1016/j.neunet.2025.107299","url":null,"abstract":"<div><div>Abnormal behavior detection in surveillance video, as one of the essential functions in the intelligent surveillance system, plays a vital role in anti-terrorism, maintaining stability, and ensuring social security. Aiming at the problem of extremely imbalance between normal behavior data and abnormal behavior data, the probabilistic memory model-based network is designed to learn from the distribution of normal behaviors and guide the detection of abnormal behavior. An auto-encoding model is employed as the backbone network, and the gap between the predicted future frame and the real frame is used to measure the degree of abnormality. An autoregressive conditional probability estimation model and a normal distribution memory model are employed as auxiliary modules, to achieve the prediction of normal frames. When extracting temporal and spatial features in the backbone network, the causal three-dimensional convolution and time-dimension shared fully connected layers are used to avoid future information leakage and ensure the timing of information. In addition, from the perspective of probability entropy and behavioral modality diversity, autoregressive probability model is proposed to fit the distribution of input normal frame, so the network converges to the low entropy state of the normal behavior distribution. The memory module stores the feature of normal behavior in historical data, and injects the current input data. The memory vector and the encoding vector are concatenated along the time dimension and input to the decoder, realizing normal frame prediction. Using public datasets, ablation and comparison experiments show that the proposed algorithm has significant advantages in anomaly detection.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107299"},"PeriodicalIF":6.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-03-08DOI: 10.1016/j.neunet.2025.107331
Yiqun Liu, Lili Chen, Yanfeng Zhao, Zhen Wang
{"title":"Exponential stability of infinite-dimensional impulsive stochastic systems with Poisson jumps under aperiodically intermittent control","authors":"Yiqun Liu, Lili Chen, Yanfeng Zhao, Zhen Wang","doi":"10.1016/j.neunet.2025.107331","DOIUrl":"10.1016/j.neunet.2025.107331","url":null,"abstract":"<div><div>This paper studies the problem of mean square exponential stability (ES) for a class of impulsive stochastic infinite-dimensional systems with Poisson jumps (ISIDSP) using aperiodically intermittent control (AIC). It provides a detailed analysis of impulsive disturbances, and the related inequalities are given for the two cases when the impulse perturbation occurs at the start time points of the control and rest intervals or non-startpoints, respectively. Additionally, in virtue of Yosida approximating systems, combining with the Lyapunov method, graph theory and the above inequalities, criteria for ES of the above impulsive stochastic infinite-dimensional systems are established under AIC for these two perturbation scenarios. These criteria elucidate the effects of the impulsive perturbation strength, the ratio of control period, to rest period, and network topology on ES. Finally, the theoretical results are applied to a class of neural networks with reaction–diffusion processes, and the effectiveness of the findings is validated through numerical simulations.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107331"},"PeriodicalIF":6.0,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-03-08DOI: 10.1016/j.neunet.2025.107334
Sriprabha Ramanarayanan , Rahul G.S. , Mohammad Al Fahim , Keerthi Ram , Ramesh Venkatesan , Mohanasankar Sivaprakasam
{"title":"SHFormer: Dynamic spectral filtering convolutional neural network and high-pass kernel generation transformer for adaptive MRI reconstruction","authors":"Sriprabha Ramanarayanan , Rahul G.S. , Mohammad Al Fahim , Keerthi Ram , Ramesh Venkatesan , Mohanasankar Sivaprakasam","doi":"10.1016/j.neunet.2025.107334","DOIUrl":"10.1016/j.neunet.2025.107334","url":null,"abstract":"<div><div>Attention Mechanism (AM) selectively focuses on essential information for imaging tasks and captures relationships between regions from distant pixel neighborhoods to compute feature representations. Accelerated magnetic resonance image (MRI) reconstruction can benefit from AM, as the imaging process involves acquiring Fourier domain measurements that influence the image representation in a non-local manner. However, AM-based models are more adept at capturing low-frequency information and have limited capacity in constructing high-frequency representations, restricting the models to smooth reconstruction. Secondly, AM-based models need mode-specific retraining for multimodal MRI data as their knowledge is restricted to local contextual variations within modes that might be inadequate to capture the diverse transferable features across heterogeneous data domains. To address these challenges, we propose a neuromodulation-based discriminative multi-spectral AM for scalable MRI reconstruction, that can (i) propagate the context-aware high-frequency details for high-quality image reconstruction, and (ii) capture features reusable to deviated unseen domains in multimodal MRI, to offer high practical value for the healthcare industry and researchers. The proposed network consists of a spectral filtering convolutional neural network to capture mode-specific transferable features to generalize to deviated MRI data domains and a dynamic high-pass kernel generation transformer that focuses on high-frequency details for improved reconstruction. We have evaluated our model on various aspects, such as comparative studies in supervised and self-supervised learning, diffusion model-based training, closed-set and open-set generalization under heterogeneous MRI data, and interpretation-based analysis. Our results show that the proposed method offers scalable and high-quality reconstruction with best improvement margins of <span><math><mo>∼</mo></math></span>1 dB in PSNR and <span><math><mo>∼</mo></math></span>0.01 in SSIM under unseen scenarios. Our code is available at <span><span>https://github.com/sriprabhar/SHFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107334"},"PeriodicalIF":6.0,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}