NeurocomputingPub Date : 2025-07-24DOI: 10.1016/j.neucom.2025.131047
Chaojun Lin , Ying Shi, Gang Wang , Shijian Liu
{"title":"Improving vision-language models through intra-modal contrastive learning-based hard sample mining","authors":"Chaojun Lin , Ying Shi, Gang Wang , Shijian Liu","doi":"10.1016/j.neucom.2025.131047","DOIUrl":"10.1016/j.neucom.2025.131047","url":null,"abstract":"<div><div>Driving environmental perception is a core component of autonomous driving systems. Recently, emerging vision-language detectors, known for their superior detection accuracy, have gradually replaced traditional detectors and have been increasingly applied in open-world driving scenarios. However, these detectors still face challenges regarding the missed detection of hard positive samples. This study identifies that a primary cause of this problem is the rejection of hard samples due to their low cross-modal consistency. To address this challenge, this work proposes a contrastive learning strategy based on a hard sample prototype memory bank to recall the potential positive samples. Additionally, to enhance the representational capacity of the detection network, an instance-level contrastive learning loss is introduced. This loss aligns the feature representations of the same instance across the deep and shallow network layers, thereby improving the ability of shallow layers to extract features from hard samples. Experimental results demonstrate that the proposed method achieves outstanding detection accuracy and is highly effective in complex urban road scenarios. The code and trained models are available at https://github.com/unbelieboomboom/HSMG_DINO.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131047"},"PeriodicalIF":5.5,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-22DOI: 10.1016/j.neucom.2025.131051
Yulin Wu , Xiaochen Wang , Dengshi Li , Ruimin Hu
{"title":"Multimodal and multichannel speech separation using location-guided speech feature mapping network","authors":"Yulin Wu , Xiaochen Wang , Dengshi Li , Ruimin Hu","doi":"10.1016/j.neucom.2025.131051","DOIUrl":"10.1016/j.neucom.2025.131051","url":null,"abstract":"<div><div>In reality, the audio and visual signals of sound sources are closely aligned, working collaboratively to isolate the desired speech signal from overlapping voices of simultaneous talkers. To leverage the complementarity and utilize all available information from both auditory and visual sources in speech separation, we propose a novel robust multimodal and multichannel speech separation method, without requiring known camera parameters. The proposed method exploits the complementarity of audio and visual modalities to estimate the speaker’s location and adopts a location-guided speech feature mapping strategy, wherein the attention mechanism fusion method combines the high-level semantic information of auditory and visual sources, aiding in the separation of target speech with its corresponding directional features. Experimental results suggest that the proposed multimodal and multichannel speech separation system outperforms the baselines, demonstrating improvements of 0.64 <em>dB</em> in SI-SDR and 0.17 in PESQ, respectively. The proposed system consistently outperformed the baselines by achieving a 10.14 % absolute (26.05 % relative) word error rate (WER) reduction.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131051"},"PeriodicalIF":5.5,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-21DOI: 10.1016/j.neucom.2025.131023
Xue Weimin , Liu Yisha , Zhuang Yan
{"title":"A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion","authors":"Xue Weimin , Liu Yisha , Zhuang Yan","doi":"10.1016/j.neucom.2025.131023","DOIUrl":"10.1016/j.neucom.2025.131023","url":null,"abstract":"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131023"},"PeriodicalIF":5.5,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-21DOI: 10.1016/j.neucom.2025.131060
Ge Song , Lianzheng Su , Xinmiao Wang , Zhonghao Huang , Shian Wang , Qiuyue Fu , Peng Wang
{"title":"Deep learning-based chromosome segmentation and extraction: A comprehensive review of methodologies, challenges, and future directions","authors":"Ge Song , Lianzheng Su , Xinmiao Wang , Zhonghao Huang , Shian Wang , Qiuyue Fu , Peng Wang","doi":"10.1016/j.neucom.2025.131060","DOIUrl":"10.1016/j.neucom.2025.131060","url":null,"abstract":"<div><div>Chromosome karyotyping is fundamental to cytogenetics, facilitating the diagnosis of genetic disorders and malignancies through detailed structural analysis of chromosomes. A major technical challenge is the precise segmentation and extraction of complete, non-overlapping chromosomes, especially in cases involving dense chromosome clusters or significant morphological variation. Although deep learning has achieved notable success in general image processing, its application to chromosomal analysis has only recently gained momentum, and comprehensive evaluations remain scarce. This review systematically examines recent advances in deep learning-based chromosome segmentation and extraction, summarizing prevailing methodologies and key limitations. It traces the evolution from early convolutional neural networks to encoder-decoder architectures and generative models, highlighting advances in spatial detail recovery, robustness against overlapping structures, and domain adaptation. Furthermore, the paper categorizes chromosomal segmentation into semantic, instance, and hybrid paradigms, elucidates methodological trends such as the incorporation of biological priors and the adoption of multi-task learning, and discusses practical and cognitive challenges that hinder clinical implementation. By providing a comprehensive overview and outlining future directions—including explainable AI and synthetic data augmentation—this work aims to accelerate the development of intelligent, fully automated chromosome karyotyping systems.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131060"},"PeriodicalIF":5.5,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-21DOI: 10.1016/j.neucom.2025.131007
Abdullah Abdul Sattar Shaikh, Saeed Samet
{"title":"Federated graph neural networks in non-IID scenarios—A comprehensive survey","authors":"Abdullah Abdul Sattar Shaikh, Saeed Samet","doi":"10.1016/j.neucom.2025.131007","DOIUrl":"10.1016/j.neucom.2025.131007","url":null,"abstract":"<div><div>Federated Graph Neural Networks (FedGNNs) have emerged as a promising solution to securely train structured graph data in Federated learning (FL) settings. In this paper, we present one of the first works that categorizes the various non-IID (Non-Identically and Independently Distributed) scenarios and challenges occurring in FedGNNs, offering insights into horizontal and vertical non-IID cases. Horizontal non-IID refers to variations in data distributions among clients, while vertical non-IID involves attribute and label disparities within the clients. We briefly discuss works addressing these scenarios and their respective advantages and disadvantages. Additionally, we explore other approaches like centralized and decentralized methods, in mitigating non-IID effects, highlighting their benefits in terms of shared knowledge, privacy preservation, and scalability. Furthermore, we emphasize the importance of evaluating and quantifying non-IIDness in graph data through statistical measures. Our work contributes to the understanding of FedGNNs’ applicability in healthcare, finance, recommender systems, transportation, and other domains. We also identify future research directions, such as taxonomy development, handling complete structural heterogeneity, and exploring adaptive mechanisms, to enhance the robustness and reliability of FedGNNs in real-time scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131007"},"PeriodicalIF":5.5,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-19DOI: 10.1016/j.neucom.2025.130904
Mingfei Lu , Shujian Yu , Robert Jenssen , Badong Chen
{"title":"Generalized Cauchy–Schwarz divergence: Efficient estimation and applications in deep learning","authors":"Mingfei Lu , Shujian Yu , Robert Jenssen , Badong Chen","doi":"10.1016/j.neucom.2025.130904","DOIUrl":"10.1016/j.neucom.2025.130904","url":null,"abstract":"<div><div>Divergence measures play a fundamental role in machine learning and deep learning; however, efficient methods for handling multiple distributions (i.e., more than two) remain largely underexplored. This challenge is particularly critical in scenarios where managing multiple distributions simultaneously is both necessary and unavoidable, such as clustering, multi-source domain adaptation, and multi-view learning. A common approach to quantifying overall divergence involves computing the mean pairwise distances between distributions. However, this method suffers from two key limitations. First, it is restricted to pairwise comparisons and fails to capture higher-order interactions or dependencies among three or more distributions. Second, its implementation requires a double-loop traversal over all distribution pairs, leading to significant computational overhead, particularly when dealing with a large number of distributions. In this study, we introduce the generalized Cauchy–Schwarz divergence (GCSD), a novel divergence measure specifically designed for multiple distributions. To facilitate its practical application, we propose a kernel-based closed-form sample estimator, which enables efficient computation in various deep-learning contexts. Furthermore, we validate GCSD through two representative tasks: deep clustering, achieved by maximizing the generalized divergence between clusters, and multi-source domain adaptation, achieved by minimizing the generalized discrepancy among feature distributions. Extensive experimental evaluations highlight the robustness and effectiveness of GCSD in both tasks, underscoring its potential to advance machine learning techniques that require the quantification of multiple distributions.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 130904"},"PeriodicalIF":5.5,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-19DOI: 10.1016/j.neucom.2025.131022
Ranmeng Lin , Runda Jia , Fengyang Jiang , Jun Zheng , Dakuo He , Kang Li , Fuli Wang
{"title":"Safe coordinated optimization of the thickening-dewatering process via reinforcement learning with real-time human guidance","authors":"Ranmeng Lin , Runda Jia , Fengyang Jiang , Jun Zheng , Dakuo He , Kang Li , Fuli Wang","doi":"10.1016/j.neucom.2025.131022","DOIUrl":"10.1016/j.neucom.2025.131022","url":null,"abstract":"<div><div>Due to its trial-and-error learning mechanism and limited intelligence, current reinforcement learning (RL) faces significant safety risks when applied to complex industrial scenarios. To improve its deployability in high-risk environments, this paper takes the thickening-dewatering process, a key and energy-intensive subprocess in mineral processing, as the research object and proposes a safe RL coordination optimization framework that leverages real-time human guidance mechanisms. The framework consists of two human-in-the-loop models: first, a human supervision model based on soft sensing, which predicts the safety of the agent’s actions at each step and identifies potential risks in advance; second, a human demonstration model based on imitation learning, which automatically generates safe alternative actions in line with human expertise when unsafe actions are detected. Finally, the safe actions, evaluated and filtered by the models, are used for interaction with the environment to ensure the safety of the RL process. Furthermore, this paper derives the upper bound of the discounted failure probability for the algorithm, theoretically validating the safety enhancement provided by the human guidance mechanism. Experimental results demonstrate that, while achieving a 100 % training safety rate, the proposed algorithm reduces energy consumption by 15.62 % compared to existing optimization algorithm, showing significant potential for practical application and broader deployment.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131022"},"PeriodicalIF":5.5,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-19DOI: 10.1016/j.neucom.2025.131025
Ling Liu , Wang Xu , Pan Zhou , Xiaoqiong Xu , Xi Chen , Hongfang Yu , Gang Sun
{"title":"Optimizing global parameter synchronization for geo-distributed machine learning in reconfigurable optical wide area networks","authors":"Ling Liu , Wang Xu , Pan Zhou , Xiaoqiong Xu , Xi Chen , Hongfang Yu , Gang Sun","doi":"10.1016/j.neucom.2025.131025","DOIUrl":"10.1016/j.neucom.2025.131025","url":null,"abstract":"<div><div>Geo-distributed machine learning (Geo-DML) usually uses a hierarchical training architecture, local parameter synchronization (LPS) within data center and global parameter synchronization (GPS) between data centers. Compared to fast LAN bandwidth, the heterogeneous and scarce WAN bandwidth becomes one of the main bottlenecks of training performance for Geo-DML. Fortunately, the emerging optical technologies render the modern WAN topology reconfigurable, which has been adopted to improve the performance of some traditional traffic with the help of software-defined networking (SDN). However, the reconfigurable WAN topology is often overlooked by most schemes aimed at accelerating Geo-DML. In this paper, we propose <em>AdaptivePS</em>, an adaptive global parameter synchronization scheduling scheme that leverages the reconfigurable feature of WAN topology and the training characteristics to speed up Geo-DML training. Specifically, mathematical optimization models considering the topology construction and parameter synchronization scheduling are firstly established. Then <em>AdaptivePS</em> solves the mathematical models through <em>relaxing</em> and <em>deterministic rounding</em> scheme, obtaining the deployment of global aggregation nodes, wavelength allocation, path and rate allocation. The simulation results based on real WAN topologies show that compared to <em>RoWAN</em>, <em>RAPIER</em> and <em>Baseline</em>, <em>AdaptivePS</em> can reduce global communication time (GCT) by up to <span><math><mn>73.4</mn><mspace></mspace><mi>%</mi></math></span>, <span><math><mn>86.7</mn><mspace></mspace><mi>%</mi></math></span>, <span><math><mn>96.2</mn><mspace></mspace><mi>%</mi></math></span>, respectively. This demonstrates that <em>AdaptivePS</em> can effectively cope with different network environments, with the help of adaptive selection of global aggregation nodes, reconfigurable topology, and mathematical model based scheduling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131025"},"PeriodicalIF":5.5,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-19DOI: 10.1016/j.neucom.2025.130908
Miao Shu , Qiankun Song , Yurong Liu
{"title":"Stability of delayed quaternion-valued neural networks with general probabilistic bounded Markovian switching","authors":"Miao Shu , Qiankun Song , Yurong Liu","doi":"10.1016/j.neucom.2025.130908","DOIUrl":"10.1016/j.neucom.2025.130908","url":null,"abstract":"<div><div>The study delves into the stability problem of quaternion-valued neural networks (QVNNs) with time-varying discrete delays and distributed delays as well as general probabilistic bounded Markovian switching. Firstly, the non-commutative nature of quaternion multiplication complicates theoretical analysis and numerical computation when decomposition methods are employed. To address this, a suitable Lyapunov-Krasovskii functional is constructed and combined with the method of free-weighting matrix and inequality techniques, the QVNNs with Markovian switching are analyzed as a whole, yielding stability criteria in the form of linear matrix inequalities (LMIs). Additionally, as the transition probabilities are general probabilistic bounded, the system becomes more versatile and realistic. Finally, two example with simulations are given to show the validity and applicability of the achieved result.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130908"},"PeriodicalIF":5.5,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144672036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-19DOI: 10.1016/j.neucom.2025.131011
D. Minola Davids , A. Arul Edwin Raj , C. Seldev Christopher
{"title":"SportSummarizer: A unified multimodal fusion transformer for context-aware sports video summarization","authors":"D. Minola Davids , A. Arul Edwin Raj , C. Seldev Christopher","doi":"10.1016/j.neucom.2025.131011","DOIUrl":"10.1016/j.neucom.2025.131011","url":null,"abstract":"<div><div>Automated sports video summarization faces critical challenges due to the complexity of dynamic gameplay, event variability, and the intricate rules governing sports like cricket and soccer. Existing methods often struggle to capture key moments accurately, resulting in false positives, redundant content like replays, ineffective multimodal data integration, and challenges in spatio-temporal modeling and semantic event understanding. To overcome these limitations, a novel Unified Multimodal Fusion Transformer is proposed for the summarization of cricket and soccer videos. This approach utilizes advanced feature encoding across multiple modalities: ViViT for video, OpenL3 for audio, and DistilBERT for text, ensuring robust multimodal representations. A multimodal fusion transformer with contextual cross-quadrimodal attention is introduced to address weak multimodal integration, enabling the model to capture complex interactions across visual, audio, and textual data for precise event detection. Further, Hierarchical Temporal Convolutional Networks (Hierarchical TCN) module integrates hierarchical temporal and metadata-enhanced positional to model both short and long-term game sequences effectively. Additionally, replay and redundancy elimination mechanisms are applied to remove repetitive content, generating concise and high-quality video summaries that reflect the game's critical moments. The proposed method achieves state-of-the-art results, with the highest precision (99.2 %) and recall (98.9 %), and a low error rate (4 %). It also demonstrates superior ROC-AUC performance (0.88) and maintains peak accuracy (89.5 %) with strong performance in mIoU (0.82) and highlight diversity (0.93), highlighting its robustness across various event detection metrics.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131011"},"PeriodicalIF":5.5,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}