{"title":"HMgNO: Hybrid multigrid neural operator with low-order numerical solver for partial differential equations","authors":"Yifan Hu , Weimin Zhang , Fukang Yin , Jianping Wu","doi":"10.1016/j.neunet.2025.107649","DOIUrl":"10.1016/j.neunet.2025.107649","url":null,"abstract":"<div><div>Traditional numerical methods face a trade-off between computational cost and accuracy when solving partial differential equations. Low-order solvers are fast but less accurate, while high-order solvers are accurate but much slower. To address this challenge, we propose a novel framework, the hybrid multigrid neural operator (HMgNO). The HMgNO couples a low-order numerical solver with a multigrid neural operator, and the neural operator is used to correct the low-order numerical solutions to obtain high-order accuracy at each fixed time step size. Thus, the HMgNO achieves accurate solutions while ensuring computational efficiency. Moreover, our framework supports multiple types of low-order numerical solvers, such as finite difference and spectral methods. Experiments on the Navier-Stokes, shallow-water, and diffusion-reaction equations demonstrate that the proposed framework achieves the lowest relative error and smallest spectral bias with few model parameters and fast inference speed.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107649"},"PeriodicalIF":6.0,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144185013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-23DOI: 10.1016/j.neunet.2025.107552
Weihao Luo , Zezhen Zeng , Yueqi Zhong
{"title":"Enhancing image-based virtual try-on with Multi-Controlled Diffusion Models","authors":"Weihao Luo , Zezhen Zeng , Yueqi Zhong","doi":"10.1016/j.neunet.2025.107552","DOIUrl":"10.1016/j.neunet.2025.107552","url":null,"abstract":"<div><div>Image-based virtual try-on technology digitally overlays clothing onto images of individuals, enabling users to preview how garments fit without physical trial, thus enhancing the online shopping experience. While current diffusion-based virtual try-on networks produce high-quality results, they struggle to accurately render garments with textual designs such as logos or prints which are widely prevalent in the real world, often carrying significant brand and cultural identities. To address this challenge, we introduce the Multi-Controlled Diffusion Models for Image-based Virtual Try-On (MCDM-VTON), a novel approach that synergistically incorporates global image features and local textual features extracted from garments to control the generation process. Specifically, we innovatively introduce an Optical Character Recognition (OCR) model to extract the text-style textures from clothing, utilizing the information gathered as text features. These features, in conjunction with the inherent global image features through a multimodal feature fusion module based on cross-attention, jointly control the denoising process of the diffusion models. Moreover, by extracting text information from both the generated virtual try-on results and the original garment images with the OCR model, we have devised a new content-style loss to supervise the training of diffusion models, thereby reinforcing the generation effect of text-style textures. Extensive experiments demonstrate that MCDM-VTON significantly outperforms existing state-of-the-art methods in terms of text preservation and overall visual quality.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107552"},"PeriodicalIF":6.0,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-22DOI: 10.1016/j.neunet.2025.107664
Shengbin Zheng, Dechang Pi
{"title":"VKAD: A novel fault detection and isolation model for uncertainty-aware industrial processes","authors":"Shengbin Zheng, Dechang Pi","doi":"10.1016/j.neunet.2025.107664","DOIUrl":"10.1016/j.neunet.2025.107664","url":null,"abstract":"<div><div>Fault detection and isolation (FDI) are essential for effective monitoring of industrial processes. Modern industrial processes involve dynamic systems characterized by complex, high-dimensional nonlinearities, posing significant challenges for accurate modeling and analysis. Recent studies have employed deep learning methods to capture and model these complexities in dynamic systems. In contrast, Koopman operator theory offers an alternative perspective, as the Koopman operator describes the linear evolution of observables in nonlinear systems within a high-dimensional space. This linearization simplifies complex nonlinear dynamics, making them easier to analyze and interpret in higher-dimensional settings. However, the Koopman operator theory does not inherently incorporate uncertainties in dynamical systems, which can hinder its performance in process monitoring. To tackle this issue, we integrate Koopman operator theory with Variational Autoencoders to propose a novel fault detection and isolation model called the Variational Koopman Anomaly Detector (VKAD). VKAD is capable of inferring the distribution of observables from time series data of dynamical systems. By advancing the distribution through the Koopman operator over time, VKAD can capture the uncertainty in the evolution of dynamic systems. The uncertainty estimates yielded by VKAD are applicable for both fault detection and isolation in industrial processes. The effectiveness of the proposed VKAD were illustrated using the Tennessee Eastman Process (TEP) and a real satellite on-orbit telemetry dataset (SAT). The experimental results demonstrate that the Fault Detection Rate (FDR) of VKAD achieves superior performance on both the TEP and SAT datasets compared to advanced methods, while the Fault Alarm Rate (FAR) is also highly competitive.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107664"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-22DOI: 10.1016/j.neunet.2025.107598
Qiang Liu , Qiulei Dong , Yangyong Zhang , Xiao Lu , Zhiguo Zhang , Yuqin Chen , Huanzhou Shu , Haixia Wang
{"title":"Semantic-guided compositional scene representation framework","authors":"Qiang Liu , Qiulei Dong , Yangyong Zhang , Xiao Lu , Zhiguo Zhang , Yuqin Chen , Huanzhou Shu , Haixia Wang","doi":"10.1016/j.neunet.2025.107598","DOIUrl":"10.1016/j.neunet.2025.107598","url":null,"abstract":"<div><div>Neural scene representation methods play a key role in computer vision and graphics, but it is difficult for them to generalize to new invisible scenes. Addressing this problem, we propose a semantic-guided compositional scene representation framework in this paper, consisting of a baseline module for scene representation, a semantic mapping module, and a compositional representation strategy. In the proposed framework, the semantic mapping module learns an embedding correlation by training on visible scenes, where the embedding correlation maps explicit semantic attributes to implicit scene representations. By utilizing the embedding correlation, the framework can represent invisible scenes using only semantic attributes. Besides, the compositional representation strategy is designed to fuse the decomposed 1-object scene representations into multi-object scene representation, yielding a higher training efficiency for multi-object scenes. Extensive experimental results on three datasets demonstrate that the proposed framework can achieve high-accuracy representations for visible and invisible multi-object scenes.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107598"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144195252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consensus synchronization via quantized iterative learning for coupled fractional-order time-delayed competitive neural networks with input sharing","authors":"Tianxiang Han , Xingyu Zhou , Shuyu Zhang , Aibing Qiu","doi":"10.1016/j.neunet.2025.107569","DOIUrl":"10.1016/j.neunet.2025.107569","url":null,"abstract":"<div><div>This paper presents the <span><math><msup><mrow><mi>D</mi></mrow><mrow><mi>α</mi></mrow></msup></math></span>-type distributed iterative learning control protocol to synchronize fractional-order competitive neural networks with time delay within a finite time frame. Firstly, the input sharing strategy of such desired competitive neural network is proposed by employing the average weighted combination of neural network, so that each neural network shares its input information to accelerate synchronization speed between competitive neural networks under a fixed communication topology. With the contraction mapping approach and bellman-gronwall inequality, the learning synchronization convergence of the distributed <span><math><msup><mrow><mi>D</mi></mrow><mrow><mi>α</mi></mrow></msup></math></span>-type iterative learning protocol is rigorously analyzed along the iterative axis. Subsequently, the communication topology between neural networks is extended to a iteration-varying topology with the number of iterations, and the learning sufficient conditions for network synchronization are provided. Finally, the efficiency of the designed <span><math><msup><mrow><mi>D</mi></mrow><mrow><mi>α</mi></mrow></msup></math></span>-type iterative learning synchronization methodology is validated through three numerical simulations.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107569"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144115682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-22DOI: 10.1016/j.neunet.2025.107596
Jiaming Li , Sheng Wang , Xin Wang , Yitao Zhu , Honglin Xiong , Zixu Zhuang , Qian Wang
{"title":"ReactDiff: Latent Diffusion for Facial Reaction Generation","authors":"Jiaming Li , Sheng Wang , Xin Wang , Yitao Zhu , Honglin Xiong , Zixu Zhuang , Qian Wang","doi":"10.1016/j.neunet.2025.107596","DOIUrl":"10.1016/j.neunet.2025.107596","url":null,"abstract":"<div><div>Given the audio-visual clip of the speaker, facial reaction generation aims to predict the listener’s facial reactions. The challenge lies in capturing the relevance between video and audio while balancing appropriateness, realism, and diversity. While prior works have mostly focused on uni-modal inputs or simplified reaction mappings, recent approaches such as PerFRDiff have explored multi-modal inputs and the one-to-many nature of appropriate reaction mappings. In this work, we propose the Facial Reaction Diffusion (ReactDiff) framework that uniquely integrates a Multi-Modality Transformer with conditional diffusion in the latent space for enhanced reaction generation. Unlike existing methods, ReactDiff leverages intra- and inter-class attention for fine-grained multi-modal interaction, while the latent diffusion process between the encoder and decoder enables diverse yet contextually appropriate outputs. Experimental results demonstrate that ReactDiff significantly outperforms existing approaches, achieving a facial reaction correlation of 0.26 and diversity score of 0.094 while maintaining competitive realism. The code is open-sourced at <span><span>github</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107596"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144139611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-22DOI: 10.1016/j.neunet.2025.107600
Minqin Zhu , Anpeng Wu , Haoxuan Li , Ruoxuan Xiong , Bo Li , Fei Wu , Kun Kuang
{"title":"Learning double balancing representation for heterogeneous dose–response curve estimation","authors":"Minqin Zhu , Anpeng Wu , Haoxuan Li , Ruoxuan Xiong , Bo Li , Fei Wu , Kun Kuang","doi":"10.1016/j.neunet.2025.107600","DOIUrl":"10.1016/j.neunet.2025.107600","url":null,"abstract":"<div><div>Estimating the individuals’ potential response to varying treatment doses is crucial for decision-making in areas such as precision medicine and management science. Most recent studies predict counterfactual outcomes by learning a covariate representation that is independent of the treatment variable. However, such independence constraints neglect much of the covariate information that is useful for counterfactual prediction, especially when the treatment variables are continuous. To tackle the above issue, in this paper, we first theoretically demonstrate the importance of the <em>balancing</em> and <em>prognostic</em> representations for unbiased estimation of the heterogeneous dose–response curves, that is, the learned representations are constrained to satisfy the conditional independence between the covariates and both of the treatment variables and the potential responses. Based on this, we propose an end-to-end Contrastive balancing Representation learning Network (CRNet) and a three-stage Weighted Double Balancing Network (WDBN) using a partial distance measure, for estimating the heterogeneous dose–response curves without losing the continuity of treatments. Extensive experiments are conducted on synthetic and real-world datasets demonstrating that our proposal significantly outperforms previous methods. Code is available at: <span><span>https://github.com/euzmin/Contrastive-Balancing-Representation-Network-CRNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107600"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-22DOI: 10.1016/j.neunet.2025.107587
Yuanming Zhang , Hao Sun
{"title":"Decoding split-frequency representation for cross-scale tracking","authors":"Yuanming Zhang , Hao Sun","doi":"10.1016/j.neunet.2025.107587","DOIUrl":"10.1016/j.neunet.2025.107587","url":null,"abstract":"<div><div>Learning tailored target representations for tracking is a promising direction in visual object tracking. Most state-of-the-art methods utilize autoencoders to generate representations by reconstructing the target’s appearance. However, these reconstructions are often augmented to mimic scale jitter and alteration, neglecting physical scale observations such as those in aerial videos. This article addresses the challenge of representation learning for cross-scale tracking in generalized scenarios. Specifically, we incorporate target scale directly into the positional encoding, indicating scale through relative pixel density rather than the conventional metric of image resolution. This scale-aware encoding is then integrated into the proposed asymptotic hierarchy of decoders, designed to reconstruct representations by emphasizing the restoration of high- and low-frequency features at large and tiny scales. The reconstruction process is guided by supervised learning using split losses, enabling the generation of robust cross-scale representations for generic objects. Extensive experiments on six benchmarks — GOT-10k, LaSOT, TrackingNet, DTB70, UAV123, and TNL2K — validate the superior performance of our method. Additionally, our tracker achieves a remarkable speed of 123 frames per second on a Graphics Processing Unit, surpassing the previous best autoencoder-based tracker. The code and raw results will be made publicly available at: <span><span>https://github.com/pellab/DSC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107587"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-22DOI: 10.1016/j.neunet.2025.107632
Andrew Zhao , Erle Zhu , Rui Lu , Matthieu Lin , Yong-Jin Liu , Gao Huang
{"title":"Corrigendum to “Self-Referencing agents for unsupervised reinforcement learning” [Neural Networks Volume 188, August 2025, 107448]","authors":"Andrew Zhao , Erle Zhu , Rui Lu , Matthieu Lin , Yong-Jin Liu , Gao Huang","doi":"10.1016/j.neunet.2025.107632","DOIUrl":"10.1016/j.neunet.2025.107632","url":null,"abstract":"","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107632"},"PeriodicalIF":6.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144115684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-05-21DOI: 10.1016/j.neunet.2025.107582
Guillermo Puebla , Jeffrey S. Bowers
{"title":"Visual reasoning in object-centric deep neural networks: A comparative cognition approach","authors":"Guillermo Puebla , Jeffrey S. Bowers","doi":"10.1016/j.neunet.2025.107582","DOIUrl":"10.1016/j.neunet.2025.107582","url":null,"abstract":"<div><div>Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of generalization of the relations learned. However, in recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning within the deep learning framework. Object-centric models attempt to model input scenes as compositions of objects and relations between them. To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects. In this work we tested relation learning and generalization in several object-centric models, as well as a ResNet-50 baseline. In contrast to previous research, which has focused heavily in the same-different task in order to asses relational reasoning in DNNs, we use a set of tasks — with varying degrees of complexity — derived from the comparative cognition literature. Our results show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases. In our simpler tasks, this improves their capacity to learn and generalize visual relations in comparison to the ResNet-50 baseline. However, object-centric models still struggle in our more difficult tasks and conditions. We conclude that abstract visual reasoning remains an open challenge for DNNs, including object-centric models.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"189 ","pages":"Article 107582"},"PeriodicalIF":6.0,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144115681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}