Neural NetworksPub Date : 2025-04-22DOI: 10.1016/j.neunet.2025.107481
Lexiang Hu , Yikang Li , Zhouchen Lin
{"title":"Symmetry discovery for different data types","authors":"Lexiang Hu , Yikang Li , Zhouchen Lin","doi":"10.1016/j.neunet.2025.107481","DOIUrl":"10.1016/j.neunet.2025.107481","url":null,"abstract":"<div><div>Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance. However, constructing equivariant neural networks typically requires prior knowledge of data types and symmetries, which is difficult to achieve in most tasks. In this paper, we propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input–output mappings of the tasks. It characterizes equivariance and invariance (a special case of equivariance) of continuous groups using Lie algebra and directly solves the Lie algebra space through the inputs, outputs, and gradients of the trained neural network. Then, we extend the method to make it applicable to multi-channel data and tensor data, respectively. We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, top quark tagging, and rotated MNIST. Compared with the baseline, LieSD can accurately determine the number of Lie algebra bases without the need for expensive group sampling. Furthermore, LieSD can perform well on non-uniform datasets, whereas methods based on GANs fail. Code and data are available at <span><span>https://github.com/hulx2002/LieSD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107481"},"PeriodicalIF":6.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143870677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-04-21DOI: 10.1016/j.neunet.2025.107474
Xuan Rao , Bo Zhao , Derong Liu
{"title":"On robust learning of memory attractors with noisy deep associative memory networks","authors":"Xuan Rao , Bo Zhao , Derong Liu","doi":"10.1016/j.neunet.2025.107474","DOIUrl":"10.1016/j.neunet.2025.107474","url":null,"abstract":"<div><div>Developing the computational mechanism for memory systems is a long-standing focus in machine learning and neuroscience. Recent studies have shown that overparameterized autoencoders (OAEs) implement associative memory (AM) by encoding training data as attractors. However, the learning of memory attractors requires that the norms of all eigenvalues of the input–output Jacobian matrix are strictly less than one. Motivated by the observed strong negative correlation between the attractor robustness and the largest singular value of the Jacobian matrix, we develop the noisy overparameterized autoencoders (NOAEs) for learning robust attractors by injecting random noises into their inputs during the training procedure. Theoretical demonstrations show that the training objective of the NOAE approximately minimizes the upper bound of the weighted sum of the reconstruction error and the square of the largest singular value. Extensive experiments in terms of numerical and image-based datasets show that NOAEs not only increase the success rate of the training samples becoming attractors, but also improve the attractor robustness. Codes are available at <span><span>https://github.com/RaoXuan-1998/neural-netowrk-journal-NOAE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107474"},"PeriodicalIF":6.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143880917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-04-21DOI: 10.1016/j.neunet.2025.107455
Liu Yang, Siting Liu, Stanley J. Osher
{"title":"Fine-tune language models as multi-modal differential equation solvers","authors":"Liu Yang, Siting Liu, Stanley J. Osher","doi":"10.1016/j.neunet.2025.107455","DOIUrl":"10.1016/j.neunet.2025.107455","url":null,"abstract":"<div><div>In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations using prompted data, during the inference stage without weight updates. However, the current model’s overdependence on function data overlooks the invaluable human insight into the operator. To address this, we present a transformation of in-context operator learning into a multi-modal paradigm. In particular, we take inspiration from the recent success of large language models, and propose using “captions” to integrate human knowledge about the operator, expressed through natural language descriptions and equations. Also, we introduce a novel approach to train a language-model-like architecture, or directly fine-tune existing language models, for in-context operator learning. We beat the baseline on single-modal learning tasks, and also demonstrated the effectiveness of multi-modal learning in enhancing performance and reducing function data requirements. The proposed method not only significantly enhanced the development of the in-context operator learning paradigm, but also created a new path for the application of language models.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107455"},"PeriodicalIF":6.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpikeCLIP: A contrastive language–image pretrained spiking neural network","authors":"Changze Lv , Tianlong Li , Wenhao Liu , Yufei Gu , Jianhan Xu , Cenyuan Zhang , Muling Wu , Xiaoqing Zheng , Xuanjing Huang","doi":"10.1016/j.neunet.2025.107475","DOIUrl":"10.1016/j.neunet.2025.107475","url":null,"abstract":"<div><div>Spiking Neural Networks (SNNs) have emerged as a promising alternative to conventional Artificial Neural Networks (ANNs), demonstrating comparable performance in both visual and linguistic tasks while offering the advantage of improved energy efficiency. Despite these advancements, the integration of linguistic and visual features into a unified representation through spike trains poses a significant challenge, and the application of SNNs to multimodal scenarios remains largely unexplored. This paper presents SpikeCLIP, a novel framework designed to bridge the modality gap in spike-based computation. Our approach employs a two-step recipe: an “alignment pre-training” to align features across modalities, followed by a “dual-loss fine-tuning” to refine the model’s performance. Extensive experiments reveal that SNNs achieve results on par with ANNs while substantially reducing energy consumption across various datasets commonly used for multimodal model evaluation. Furthermore, SpikeCLIP maintains robust image classification capabilities, even when dealing with classes that fall outside predefined categories. This study marks a significant advancement in the development of energy-efficient and biologically plausible multimodal learning systems.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107475"},"PeriodicalIF":6.0,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143870673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DGPrompt: Dual-guidance prompts generation for vision-language models","authors":"Tai Zheng, Zhen-Duo Chen, Zi-Chao Zhang, Zhen-Xiang Ma, Li-Jun Zhao, Chong-Yu Zhang, Xin Luo, Xin-Shun Xu","doi":"10.1016/j.neunet.2025.107472","DOIUrl":"10.1016/j.neunet.2025.107472","url":null,"abstract":"<div><div>Introducing learnable prompts into CLIP and fine-tuning them have demonstrated excellent performance across many downstream tasks. However, existing methods have insufficient interaction between modalities and neglect the importance of hierarchical contextual information, leading to ineffective alignment in both the visual and textual representation spaces. Additionally, CLIP is highly sensitive to prompts, making learnable prompts prone to overfitting on seen classes, which results in the forgetting of general knowledge of CLIP and severely impair generalization ability on unseen classes. To address these issues, we propose an original <span><math><mi>D</mi></math></span>ual-<span><math><mi>G</mi></math></span>uidance <span><math><mi>Prompt</mi></math></span>s Generation (<span><math><mi>DGPrompt</mi></math></span>) method that promotes alignment between visual and textual spaces while ensuring the continuous retention of general knowledge. The main ideas of DGPrompt are as follows: 1) The extraction of image and text embeddings are guided mutually by generating visual and textual prompts, making full use of complementary information from both modalities to align visual and textual spaces. 2) The prompt-tuning process is restrained by a retention module, reducing the forgetting of general knowledge. Extensive experiments conducted in settings of base-to-new class generalization and few-shot learning demonstrate the superiority of the proposed method. Compared with the baseline method CLIP and the state-of-the-art method MaPLe, DGPrompt exhibits favorable performance and achieves an absolute gain of 7.84% and 0.99% on overall harmonic mean, averaged over 11 diverse image recognition datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107472"},"PeriodicalIF":6.0,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143870675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-04-17DOI: 10.1016/j.neunet.2025.107473
Zhiqiang Wan , Yi-Fei Pu , Qiang Lai
{"title":"Multiscroll hidden attractor in memristive autapse neuron model and its memristor-based scroll control and application in image encryption","authors":"Zhiqiang Wan , Yi-Fei Pu , Qiang Lai","doi":"10.1016/j.neunet.2025.107473","DOIUrl":"10.1016/j.neunet.2025.107473","url":null,"abstract":"<div><div>In current neurodynamic studies, memristor models using polynomial or multiple nested composite functions are primarily employed to generate multiscroll attractors, but their complex mathematical form restricts both research and application. To address this issue, without relying on polynomial and multiple nested composite functions, this study devises a unique memristor model and a memristive autapse HR (MAHR) neuron model featuring multiscroll hidden attractor. Specially, the quantity of scrolls within the multiscroll hidden attractors is regulated by simulation time. Besides, a simple control factor is incorporated into the memristor to improve the MAHR neuron model. Numerical analysis further finds that the quantity of scrolls within the multiscroll hidden attractor from the improved MAHR neuron model can be conveniently adjusted by only changing a single parameter or initial condition of the memristor. Moreover, a microcontroller-based hardware experiment is conducted to confirm that the improved MAHR neuron model is physically feasible. Finally, an elegant image encryption scheme is proposed to explore the real-world applicability of the improved MAHR neuron model.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107473"},"PeriodicalIF":6.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143854911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive token selection for scalable point cloud transformers","authors":"Alessandro Baiocchi , Indro Spinelli , Alessandro Nicolosi , Simone Scardapane","doi":"10.1016/j.neunet.2025.107477","DOIUrl":"10.1016/j.neunet.2025.107477","url":null,"abstract":"<div><div>The recent surge in 3D data acquisition has spurred the development of geometric deep learning models for point cloud processing, boosted by the remarkable success of transformers in natural language processing. While point cloud transformers (PTs) have achieved impressive results recently, their quadratic scaling with respect to the point cloud size poses a significant scalability challenge for real-world applications. To address this issue, we propose the Adaptive Point Cloud Transformer (AdaPT), a standard PT model augmented by an adaptive token selection mechanism. AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds. Furthermore, we introduce a budget mechanism to flexibly adjust the computational cost of the model at inference time without the need for retraining or fine-tuning separate models. Our extensive experimental evaluation on point cloud classification tasks demonstrates that AdaPT significantly reduces computational complexity while maintaining competitive accuracy compared to standard PTs. The code for AdaPT is publicly available at <span><span>https://github.com/ispamm/adaPT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107477"},"PeriodicalIF":6.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-04-15DOI: 10.1016/j.neunet.2025.107460
Wenbo Wu , Lei Liu , Jingtao Wang , Bin Li , Zongyu Ye , Wangmeng Zuo , Yun Pan
{"title":"Multi-stage network for single image deblurring based on dual-domain window mamba","authors":"Wenbo Wu , Lei Liu , Jingtao Wang , Bin Li , Zongyu Ye , Wangmeng Zuo , Yun Pan","doi":"10.1016/j.neunet.2025.107460","DOIUrl":"10.1016/j.neunet.2025.107460","url":null,"abstract":"<div><div>Multi-stage methods have been proven effective and widely used in image deblurring research. These methods, usually designed based on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs), have limitations, including the inability to capture global contextual information and a quadratic increase in computational complexity as image resolution. Additionally, although current methods have incorporated frequency domain information, they do not sufficiently explore the interrelationships of different frequencies. To address these issues, we proposed a Multi-Stage Visual Dual-Domain Window Mamba (DDWMamba) approach to realize image deblurring, leveraging the benefits of state space models (SSMs) for image data. First, to achieve better deblurring effects, we used a multi-stage design approach in which each stage maintains the details and global information of the original resolution image. Second, we proposed a DDWMamba Block, which includes a Spatial Window Visual Mamba and a Frequency Window Visual Mamba, aiming to fully explore the correlations between different pixels in both the spatial and frequency domains. Finally, to implement a coarse-to-fine design approach in the multi-stage method and reduce model complexity, we set a window operation with different window sizes for each stage. DDWMamba is extensively evaluated on several benchmark datasets, and the model achieves superior performance compared to existing state-of-the-art deblurring methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107460"},"PeriodicalIF":6.0,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143854912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-04-13DOI: 10.1016/j.neunet.2025.107495
Shi Yin, Hui Liu
{"title":"Driving scene image Dehazing model based on multi-branch and multi-scale feature fusion","authors":"Shi Yin, Hui Liu","doi":"10.1016/j.neunet.2025.107495","DOIUrl":"10.1016/j.neunet.2025.107495","url":null,"abstract":"<div><div>Image dehazing is critical for enhancing image quality in applications such as autonomous driving, surveillance, and remote sensing. This paper presents an innovative image dehazing model based on a multi-branch and multi-scale feature fusion network that leverages spatial and frequency information. The model features a multi-branch architecture that combines local and global features through depthwise separable convolutions and state space models, effectively capturing both detailed and comprehensive information to improve dehazing performance. Additionally, a specialized module integrates spatial and frequency domain information by utilizing convolutional layers and Fourier transforms, enabling comprehensive haze removal through the fusion of these two domains. A feature fusion mechanism incorporates channel attention and residual connections, dynamically adjusting the importance of different channel features while preserving the global structural information of the input image. Furthermore, this is the first model to combine Mamba and convolution layers for driving scene image dehazing, achieving global feature extraction with linear complexity. Each image is processed in only 0.030 s, with a frame rate of 32.41 FPS and a processing efficiency of 67.96 MPx/s, ensuring high efficiency suitable for real-time applications. Extensive experiments on real-world foggy driving scene datasets demonstrate the superior performance of the proposed method, providing reliable visual perception capabilities and significantly improving adaptability and robustness in complex environments.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107495"},"PeriodicalIF":6.0,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YOLOv8-G2F: A portable gesture recognition optimization algorithm","authors":"Zhao Feng , Junjian Huang , Wei Zhang , Shiping Wen , Yangpeng Liu , Tingwen Huang","doi":"10.1016/j.neunet.2025.107469","DOIUrl":"10.1016/j.neunet.2025.107469","url":null,"abstract":"<div><div>Hand gesture recognition (HGR) is a significant research area with applications in human–computer interaction, artificial intelligence, and more. In the early stage of development of HGR, there are high hardware costs and large usage requirements. To reduce the high cost expenditure and increase the application scenario, deep learning has played a crucial role. With the greater depth perception and more computing power, currently HGR is more about continuous recognition in space based on vedio. But in this article, it considers that there is a growing demand for lightweight networks with high precision for end-to-end HGR applications. In that, it still tends to recognize consecutive video frames and get results quickly. This paper introduces an enhanced network called YOLOv8-G2F, which is based on YOLOv8. It incorporates improved lightweight modules not only replace the traditional convolution module of the network’s backbone and neck but also for the C2f module in YOLOv8. The network employs linear transformations, group convolution, and depthwise separable convolution to extract image information using simpler networks. Furthermore, model pruning is also used to further reduce model size and improve accuracy. The improved model achieved a recognition accuracy of 99.2% on the nus-ii gesture dataset with a model size of 2.33 MB. After extensive comparison and ablation experiments, YOLOv8-G2F demonstrated significant progress over existing algorithms.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107469"},"PeriodicalIF":6.0,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143833250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}