Neural NetworksPub Date : 2025-11-01Epub Date: 2025-07-01DOI: 10.1016/j.neunet.2025.107782
Zengnan Wang, Feng Yan, Liejun Wang, Yabo Yin, Jiahuan Lin
{"title":"S-YOLO: An enhanced small object detection method based on adaptive gating strategy and dynamic multi-scale focus module.","authors":"Zengnan Wang, Feng Yan, Liejun Wang, Yabo Yin, Jiahuan Lin","doi":"10.1016/j.neunet.2025.107782","DOIUrl":"10.1016/j.neunet.2025.107782","url":null,"abstract":"<p><p>Detecting small objects in drone aerial imagery presents significant challenges, particularly when algorithms must operate in real-time under computational constraints. To address this issue, we propose S-YOLO, an efficient and streamlined small object detection framework based on YOLOv10. The S-YOLO architecture emphasizes three key innovations: (1) Enhanced Small Object Detection Layers: These layers augment semantic richness to improve detection of diminutive targets. (2) C2fGCU Module: Incorporating Gated Convolutional Units (GCU), this module adaptively modulates activation strength through deep feature analysis, enabling the model to concentrate on salient information while effectively mitigating background interference. (3) Dynamic Multi-Scale Fusion (DMSF) Module: By integrating SE-Norm with multi-scale feature extraction, this component dynamically recalibrates feature weights to optimize cross-scale information integration and focus. S-YOLO surpasses YOLOv10-n, achieving mAP50:95 improvements of 5.3%, 4.4%, and 1.4% on the VisDrone2019, AI-TOD, and DOTA1.0 datasets, respectively. Notably, S-YOLO maintains fewer parameters than YOLOv10-n while processing 285 images per second, establishing it as a highly efficient solution for real-time small object detection in aerial imagery.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"107782"},"PeriodicalIF":6.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-11-01Epub Date: 2025-07-11DOI: 10.1016/j.neunet.2025.107858
Ziyue Chen, Tongya Zheng, Mingli Song
{"title":"Curriculum negative mining for temporal networks.","authors":"Ziyue Chen, Tongya Zheng, Mingli Song","doi":"10.1016/j.neunet.2025.107858","DOIUrl":"10.1016/j.neunet.2025.107858","url":null,"abstract":"<p><p>Temporal networks are effective in capturing the evolving interactions of networks over time, such as social networks and e-commerce networks. In recent years, researchers have primarily concentrated on developing specific model architectures for Temporal Graph Neural Networks (TGNNs) in order to improve the representation quality of temporal nodes and edges. However, limited attention has been given to the quality of negative samples during the training of TGNNs. When compared with static networks, temporal networks present two specific challenges for negative sampling: positive sparsity and positive shift. Positive sparsity refers to the presence of a single positive sample amidst numerous negative samples at each timestamp, while positive shift relates to the variations in positive samples across different timestamps. To robustly address these challenges in training TGNNs, we introduce Curriculum Negative Mining (CurNM), a model-aware curriculum learning framework that adaptively adjusts the difficulty of negative samples. Within this framework, we first establish a dynamically updated negative pool that balances random, historical, and hard negatives to address the challenges posed by positive sparsity. Secondly, we implement a temporal-aware negative selection module that focuses on learning from the disentangled factors of recently active edges, thus accurately capturing shifting preferences. Finally, the selected negatives are combined with annealing random negatives to support stable training. Extensive experiments on 12 datasets and 3 TGNNs demonstrate that our method outperforms baseline methods by a significant margin. Additionally, thorough ablation studies and parameter sensitivity experiments verify the usefulness and robustness of our approach. Our code is available at https://github.com/zziyue83/CurNM.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"107858"},"PeriodicalIF":6.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-11-01Epub Date: 2025-07-14DOI: 10.1016/j.neunet.2025.107845
Yao Xiao, Youshen Xia
{"title":"Pixel adaptive deep-unfolding neural network with state space model for image deraining.","authors":"Yao Xiao, Youshen Xia","doi":"10.1016/j.neunet.2025.107845","DOIUrl":"10.1016/j.neunet.2025.107845","url":null,"abstract":"<p><p>Rain streaks affects the visual quality and interfere with high-level vision tasks on rainy days. Removing raindrops from captured rainy images becomes important in computer vision applications. Recently, deep-unfolding neural networks (DUNs) are shown their effectiveness on image deraining. Yet, there are two issues that need to be further addressed : 1) Deep unfolding networks typically use convolutional neural networks (CNNs), which lack the ability to perceive global structures, thereby limiting the applicability of the network model; 2) Their gradient descent modules usually rely on a scalar step size, which limits the adaptability of the method to different input images. To address the two issues, we proposes a new image de-raining method based on a pixel adaptive deep unfolding network with state space models. The proposed network mainly consists of both the adaptive pixel-wise gradient descent (APGD) module and the stage fusion proximal mapping (SFPM) module. APGD module overcomes scalar step size inflexibility by adaptively adjusting the gradient step size for each pixel based on the previous stage features. SFPM module adopts a dual-branch architecture combining CNNs with state space models (SSMs) to enhance the perception of both local and global structures. Compared to Transformer-based models, SSM enables efficient long-range dependency modeling with linear complexity. In addition, we introduce a stage feature fusion with the Fourier transform mechanism to reduce information loss during the unfolding process, ensuring key features are effectively propagated. Extensive experiments on multiple public datasets demonstrate that our method consistently outperforms state-of-the-art deraining methods in terms of quantitative metrics and visual quality. The source code is available at https://github.com/cassiopeia-yxx/PADUM.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"107845"},"PeriodicalIF":6.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time data-efficient portrait stylization via geometric alignment.","authors":"Xinrui Wang, Zhuoru Li, Xuanyu Yin, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo","doi":"10.1016/j.neunet.2025.107774","DOIUrl":"10.1016/j.neunet.2025.107774","url":null,"abstract":"<p><p>Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2× training data efficiency and 100× less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"107774"},"PeriodicalIF":6.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practicing in quiz, assessing in quiz: A quiz-based neural network approach for knowledge tracing.","authors":"Shuanghong Shen, Qi Liu, Zhenya Huang, Linbo Zhu, Junyu Lu, Kai Zhang","doi":"10.1016/j.neunet.2025.107797","DOIUrl":"10.1016/j.neunet.2025.107797","url":null,"abstract":"<p><p>Online learning has demonstrated superiority in connecting high-quality educational resources to a global audience. To ensure an excellent learning experience with sustainable and opportune learning instructions, online learning systems must comprehend learners' evolving knowledge states based on their learning interactions, known as the Knowledge Tracing (KT) task. Generally, learners practice through various quizzes, each comprising several exercises that cover similar knowledge concepts. Therefore, their learning interactions are continuous within each quiz but discrete across different quizzes. However, existing methods overlook the quiz structure and assume all learning interactions are uniformly distributed. We argue that learners' knowledge states should also be assessed in quiz since they practiced in quiz. To achieve this goal, we present a novel Quiz-based Knowledge Tracing (QKT) model, which effectively integrates the quiz structure of learning interactions. This is achieved by designing two distinct modules by neural networks: one for intra-quiz modeling and another for inter-quiz fusion. Extensive experimental results on public real-world datasets demonstrate that QKT achieves new state-of-the-art performance. The findings of this study suggest that incorporating the quiz structure of learning interactions can efficiently comprehend learners' knowledge states with fewer quizzes, and provides valuable insights into designing effective quizzes with fewer exercises.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"107797"},"PeriodicalIF":6.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-10-04DOI: 10.1016/j.neunet.2025.108173
Guizheng Guan, Bin Liu
{"title":"High-speed olfactory perception with adaptive load balancing based on a laser array reservoir computing architecture.","authors":"Guizheng Guan, Bin Liu","doi":"10.1016/j.neunet.2025.108173","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.108173","url":null,"abstract":"<p><p>In the front-end information acquisition module of intelligent olfactory systems, the inherent cross-sensitivity of gas sensors presents a significant technical challenge. While sensor-array-based architectures have been established as an effective solution to address this limitation, the requirements for real-time detection in gas identification and concentration quantification have introduced a new challenge: the intrinsic multi-channel information processing demands of array systems lead to a dramatic increase in computational complexity. In this work, we propose a photonic reservoir computing (RC) method for high-speed mixed gases olfactory perception, by leveraging the nonlinear mapping properties of semiconductor lasers and the inherent high-speed parallelism and low-energy characteristics of optical computing. A dimensional segmentation mechanism for multidimensional signals based on semiconductor laser arrays has been developed. By constructing a parallel PRC architecture, this mechanism enables distributed processing of multidimensional signals from gas sensor arrays, achieving adaptive matching between the number of activated lasers in the array and the internal feature dimensions required for computational load balancing. Numerical results indicate that the proposed system achieves high accuracy in gas classification tasks and concentration prediction performance comparable to current mainstream algorithms. This confirms the significant advantages of laser-array-based reservoirs in processing multivariable sensor data. The results provide a theoretical foundation for the development of physical RC systems oriented toward low-power rapid detection of mixed gases. With integration and miniaturization of photonic technologies, it is promising to build miniaturized brain-inspired computing systems with rapid inference capability and dynamic adaptability, thus contributing to the advancement of electronic nose technology.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108173"},"PeriodicalIF":6.3,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A continual test-time domain adaptation method for online machinery fault diagnosis under dynamic operating conditions.","authors":"Jinghui Tian, Yue Yu, Hamid Reza Karimi, Fei Gao, Jing Lin","doi":"10.1016/j.neunet.2025.108192","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.108192","url":null,"abstract":"<p><p>In practical industrial scenarios, monitoring data is collected in a streaming fashion under dynamic changes in operating conditions of mechanical systems, with continual covariate shift and label shift occurring in the collected data. Traditional transfer learning-based fault diagnosis methods typically involve pre-collecting substantial monitoring data for offline training and testing under static conditions. These approaches cannot adjust the model in real-time to continuous data shifts caused by dynamically changing conditions, resulting in a lack of adaptability and generalization. To overcome this practical challenge, a continual test-time domain adaptation (CTDA) approach with a teacher-student framework is developed for online machinery fault diagnosis under dynamic operating conditions in this study. Firstly, a class-balanced sampling mechanism is proposed to eliminate the impact of continual condition label shift by enforcing the model to learn from a uniform label distribution. Secondly, a joint positive-negative learning strategy is employed to guide model optimization and reduce the interference from pseudo-label noise. Lastly, the continual covariate shift is mitigated by performing the knowledge alignment between the teacher and student models. Comprehensive experiments on four rotating machinery datasets demonstrate that the proposed method improves average diagnosis accuracy by 3.78% in handling dynamic industrial streaming data compared to existing fault diagnosis methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108192"},"PeriodicalIF":6.3,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-10-03DOI: 10.1016/j.neunet.2025.108189
Jilan Cheng, Guoli Long, Zeyu Zhang, Zhenjia Qi, Hanyu Wang, Libin Lu, Shuihua Wang, Yudong Zhang, Jin Hong
{"title":"WaveNet-SF: A hybrid network for retinal disease detection based on wavelet transform in spatial-frequency domain.","authors":"Jilan Cheng, Guoli Long, Zeyu Zhang, Zhenjia Qi, Hanyu Wang, Libin Lu, Shuihua Wang, Yudong Zhang, Jin Hong","doi":"10.1016/j.neunet.2025.108189","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.108189","url":null,"abstract":"<p><p>Retinal diseases are a leading cause of vision impairment and blindness, with timely diagnosis being critical for effective treatment. Optical Coherence Tomography (OCT) has become a standard imaging modality for retinal disease diagnosis, but OCT images often suffer from issues such as speckle noise, complex lesion shapes, and varying lesion sizes, making interpretation challenging. In this paper, we propose a novel model, WaveNet-SF, to enhance retinal disease detection by integrating the spatial-domain and frequency-domain learning. The framework utilizes wavelet transforms to decompose OCT images into low- and high-frequency components, enabling the model to extract both global structural features and fine-grained details. To improve lesion detection, we introduce a Multi-Scale Wavelet Spatial Attention (MSW-SA) module, which enhances the model's focus on regions of interest at multiple scales. Additionally, a High-Frequency Feature Compensation (HFFC) block is incorporated to recover edge information lost during wavelet decomposition, suppress noise, and preserve fine details crucial for lesion detection. Our approach achieves state-of-the-art (SOTA) classification accuracies of 97.82 % and 99.58 % on the OCT-C8 and OCT2017 datasets, respectively, surpassing existing methods. These results demonstrate the efficacy of WaveNet-SF in addressing the challenges of OCT image analysis and its potential as a powerful tool for retinal disease diagnosis.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108189"},"PeriodicalIF":6.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-10-03DOI: 10.1016/j.neunet.2025.108177
Shunyong Li, Kun Liu, Mengjiao Zheng, Liang Bai
{"title":"Multi-view spectral clustering algorithm based on bipartite graph and multi-feature similarity fusion.","authors":"Shunyong Li, Kun Liu, Mengjiao Zheng, Liang Bai","doi":"10.1016/j.neunet.2025.108177","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.108177","url":null,"abstract":"<p><p>Multi-view clustering remains a challenging task due to the heterogeneity and inconsistency across multiple views. Most esisting multi-view spectral clustering methods adopt a two-stage approch-constructing fused spectral embeddings matrix followed by k-means clustering-which often leads to information loss and suboptimal performance. Moreover, current graph and feature fusion strategies struggle to address view-specific discrepancies and label misalignment, while their high computational complexity hinders scalability to large datasets. To overcome these limitations, we propose a unified Multi-view Spectral Clustering algorithm based on Bipartite Graph and Multi-feature Similarity Fusion (BG-MFS). The proposed framework jointly integrates bipartite graph construction, multi-feature similarity fusion, and discrete clustering within a single optimization model, enabling mutual reinforcement among components. Furthermore, an entropy-based weighting mechanism is introduced to adaptively assess the contribution of each view. Extensive experiments demonstrate that BG-MFS consistently outperforms state-of-the-art methods in both clustering accuracy and computational efficiency.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108177"},"PeriodicalIF":6.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural NetworksPub Date : 2025-10-02DOI: 10.1016/j.neunet.2025.108187
Jun Wang, Chunman Yan
{"title":"CEVG-RTNet: A real-time architecture for robust forest fire smoke detection in complex environments.","authors":"Jun Wang, Chunman Yan","doi":"10.1016/j.neunet.2025.108187","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.108187","url":null,"abstract":"<p><p>Forest fire smoke detection is crucial for early warning and emergency management, especially under complex environmental conditions such as low contrast, high transparency, background interference, low illumination, occlusion, and overlapping smoke sources. These factors significantly hinder detection accuracy in real-world scenarios. To address these challenges, we propose CEVG-RTNet, a real-time forest fire smoke detection architecture designed to enhance robustness under such complex conditions. CEVG-RTNet incorporates several novel components. The Spatial-Channel Priori Perceptual Convolution (SCPP-Conv) module improves the model's ability to localize smoke and perceive its morphology, even in low-contrast and high-transparency environments. The Hierarchical Residual Feature Alignment (HRFA) module addresses the challenge of multi-scale feature extraction by aligning local and large-scale smoke features through a residual-guided alignment strategy and multi-layer perceptron (MLP)-based aggregation. To further refine dynamic smoke detection, the Dynamic Recursive Feature Enhancement (DRFE) module applies recursive channel adaptive enhancement and cross-channel attention strategies. Additionally, Polygonal-Intersection over Union (PolyIoU) Loss, a novel loss function, is introduced to handle the morphological complexity of smoke regions. The architecture leverages a graph sparse attention mechanism to enhance accuracy without excessive computational cost. Experimental results demonstrate the effectiveness of CEVG-RTNet, with the variant CEVG-RTNet-n achieving 89.1% precision, 82.9% recall, mAP@0.5 of 89%, and mAP@0.5:0.95 of 58.9%. The model operates with 3.04M parameters, 6.6G FLOPs, and 99.42 FPS, showcasing its strong generalization, anti-interference capabilities, and suitability for complex forest fire smoke detection. The source code is available at: https://github.com/CNNanmuzi/CEVG-RTNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"108187"},"PeriodicalIF":6.3,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}