{"title":"DPCCNN: A new lightweight fault diagnosis model for small samples and high noise problem","authors":"Jiabin Zhang, Zhiqian Zhao, Yinghou Jiao, Runchao Zhao, Xiuli Hu, Renwei Che","doi":"10.1016/j.neucom.2025.129526","DOIUrl":"10.1016/j.neucom.2025.129526","url":null,"abstract":"<div><div>Obtaining large amounts of industrial data in real industrial processes is usually difficult. Meanwhile, it is difficult for deep learning-based lightweight fault diagnosis networks to obtain reliability diagnosis performance in the presence of high noise. To address these limitations, a new lightweight convolutional neural network (CNN) fault diagnosis framework, the dilated perceptually coupled convolutional neural network (DPCCNN), has been proposed for small samples and high noise problems. First, dilated layered interactive convolutional module (DLICM) is designed to obtain strong feature extraction capability, enhance the receptive field of small convolutional kernels by dilated convolution, and the self-attention mechanism compensates for the weak interactivity of deep convolution, which greatly reduces the number of parameters and computation of the model. Second, the global aggregation block (GAB) is designed to extract the contextual information of the feature map, which can focus on the basic contextual information without extensive computational requirements. The performance of this method is verified to be better than the current popular fault diagnosis models by the public dataset and the self-constructed rotor dataset, which still has good noise resistance in a lightweight framework and maintains high stability in small sample scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129526"},"PeriodicalIF":5.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-02-01DOI: 10.1016/j.neucom.2025.129468
Yunjia Lei , Joshua Luke Thompson , Son Lam Phung , Abdesselam Bouzerdoum , Hoang Thanh Le
{"title":"AMT-Net: Attention-based multi-task network for scene depth and semantics prediction in assistive navigation","authors":"Yunjia Lei , Joshua Luke Thompson , Son Lam Phung , Abdesselam Bouzerdoum , Hoang Thanh Le","doi":"10.1016/j.neucom.2025.129468","DOIUrl":"10.1016/j.neucom.2025.129468","url":null,"abstract":"<div><div>Traveling safely and independently in unfamiliar environments remains a significant challenge for people with visual impairments. Conventional assistive navigation systems, while aiming to enhance spatial awareness, typically handle crucial tasks like semantic segmentation and depth estimation separately, resulting in high computational overhead and reduced inference speed. To address this limitation, we introduce AMT-Net, a novel multi-task deep neural network designed for joint semantic segmentation and monocular depth estimation. The AMT-Net is designed with a single unified decoder, which boosts not only the model’s efficiency but also its scalability on portable devices with limited computational resources. We propose two self-attention-based modules, CSAPP and RSAB, to leverage the strengths of convolutional neural networks for extracting robust local features and Transformers for capturing essential long-range dependencies. This design enhances the ability of our model to interpret complex scenes effectively. Furthermore, AMT-Net has low computational complexity and achieves real-time performance, making it suitable for assistive navigation applications. Extensive experiments on the public NYUD-v2 dataset and the TrueSight dataset demonstrated our model’s state-of-the-art performance and the effectiveness of the proposed components.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129468"},"PeriodicalIF":5.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-02-01DOI: 10.1016/j.neucom.2025.129530
Yu Lei, Ran Jing, Fangfang Li, Quanxue Gao, Cheng Deng
{"title":"A transformer-based dual contrastive learning approach for zero-shot learning","authors":"Yu Lei, Ran Jing, Fangfang Li, Quanxue Gao, Cheng Deng","doi":"10.1016/j.neucom.2025.129530","DOIUrl":"10.1016/j.neucom.2025.129530","url":null,"abstract":"<div><div>The goal of zero-shot learning is to utilize attribute information for seen classes so as to generalize the learned knowledge to unseen classes. However, current algorithms often overlook the fact that the same attribute may exhibit different visual features across domains, leading to domain shift issues when transferring knowledge. Furthermore, in terms of visual feature extraction, networks like <em>ResNet</em> are not effective in capturing global information from images, adversely impacting recognition accuracy. To address these challenges, we propose an end-to-end <em>Transformer-Based Dual Contrastive Learning Approach</em> (TFDNet) for zero-shot learning. The network leverages the <em>Vision Transformer (ViT)</em> for extracting visual features and includes a mechanism for attribute localization to identify regions most relevant to the given attributes. Subsequently, it employs a dual contrastive learning method as a constraint, optimizing the learning process to better capture global feature representations. The proposed method makes the classifier more robust and enhances the ability to discriminate and generalize the unseen classes. Experimental results on three public datasets demonstrate the superiority of TFDNet over current state-of-the-art algorithms, validating its effectiveness in the field of zero-shot learning.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129530"},"PeriodicalIF":5.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-02-01DOI: 10.1016/j.neucom.2025.129536
Fengjuan Wang , Longkun Peng , Shan Cao , Zhaoqilin Yang , Ruonan Zhang , Gaoyun An
{"title":"VCF: An effective Vision-Centric Framework for Visual Question Answering","authors":"Fengjuan Wang , Longkun Peng , Shan Cao , Zhaoqilin Yang , Ruonan Zhang , Gaoyun An","doi":"10.1016/j.neucom.2025.129536","DOIUrl":"10.1016/j.neucom.2025.129536","url":null,"abstract":"<div><div>Recently, the wide application of large language models in the field of Visual Question Answering(VQA) has significantly boosted the progress in this field. Despite achieved advancements, LLMs cannot fully perceive and comprehend visual information well from image. Therefore, how to fully mine visual information is very important for language models to effectively deal with the VQA task. In response to this challenge, we propose a straightforward yet effective Vision-Centric Framework(VCF) for VQA, which mainly includes an adaptive visual perceptron module, a multi-source feature fusion module, and a large language model. The adaptive visual perceptron module effectively condenses and integrates the extensive visual information sequence from the visual encoder output using a fixed number of query embeddings. The multi-source feature fusion module is concentrated on extracting fine-grained visual perception information by fusing visual features of different scales. Finally, by channeling their outputs, the language model leverages its extensive implicit knowledge to produce a more nuanced and precise synthesis of visual information, ultimately delivering the answer. The synergy and complementarity of the two modules jointly enhance the robustness of the model. Through extensive experiments, VCF achieves nearly state-of-the-art experimental results on datasets such as VQAv2, OK-VQA, GQA, Text-VQA and others. At the same time, a series of ablation experiments have been conducted to demonstrate the efficacy of the proposed module. Additionally, VCF even achieves better or equivalent performance compared to some larger-scale models, such as LLaVa-1.5, Pink.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129536"},"PeriodicalIF":5.5,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143210300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-01-31DOI: 10.1016/j.neucom.2025.129595
Jingwei Cao , Guoyang Hou , Jiawang Lv , Tianlin Gao , Liming Di , Chengtao Zhang
{"title":"MPLR-CapsNet: A novel capsule network with multi-line parallel features and logical reasoning for occluded pedestrian detection","authors":"Jingwei Cao , Guoyang Hou , Jiawang Lv , Tianlin Gao , Liming Di , Chengtao Zhang","doi":"10.1016/j.neucom.2025.129595","DOIUrl":"10.1016/j.neucom.2025.129595","url":null,"abstract":"<div><div>Occluded pedestrian detection is a key technology in computer vision. The parts of pedestrians covered are random, and diverse clothes, postures, and varying scales make occluded pedestrian detection a challenging task. In order to solve the above issues, we propose a novel capsule network model, MPLR-CapsNet, based on multi-line parallel features and logical reasoning, aiming to achieve accurate detection of occluded pedestrians in dense scenes. This network model introduces RS module to learn diversified key feature information in a directional manner, and use Mix-squash function to enhance the anti-interference and generalization abilities of the network model. Furthermore, the local pruning strategy is adopted to retain only the important low-level capsules connected to high-level capsules in the dynamic routing stage, effectively reducing the training parameters of the network model. In addition, we propose a capsule reasoning module based on the hyper-sausage metric model in the reasoning detection stage, transforming the object category and coordinates into the attribute representations of capsule entities, which significantly enhance the autonomous reasoning and discriminant abilities of the network model. Extensive experiments conducted on two standard occluded pedestrian datasets validate the advancement of the proposed method, outperforming existing state-of-the-art solutions. Our model achieves Top-1 accuracies of 97.6 % and 94.5 % on the CityPersons and CrowdHuman datasets, respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129595"},"PeriodicalIF":5.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143210692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-01-31DOI: 10.1016/j.neucom.2025.129529
Haiyan Kang, Bing Wu, Yiran Cao
{"title":"Research on efficient data processing method for fog computing based on blockchain and federated learning","authors":"Haiyan Kang, Bing Wu, Yiran Cao","doi":"10.1016/j.neucom.2025.129529","DOIUrl":"10.1016/j.neucom.2025.129529","url":null,"abstract":"<div><div>Federated learning provides an effective distributed machine learning solution for \"centralized\" processing of decentralized data. However, the increasing size of data and complexity of learning models in the context of digitalization have placed higher demands on the computing power of training equipment. To address the above problems, a Fog Computing Blockchain Federated Learning (FC-BCFL) method is proposed to realize secure and efficient processing of massive data. Firstly, an incentive mechanism based on time reputation value and loss reputation value is designed with the help of blockchain to motivate clients with high-quality data to join the training and improve the accuracy of the model. Secondly, a fog node selection mechanism based on time reputation value is proposed and designed to select efficient fog nodes to train local models at the edge of the network using the perturbed data from clients processed by the local differential privacy technique in order to shorten the training time of federated learning and improve the training efficiency. In addition, a model aggregation LR-FedAvg algorithm is designed in this method, which weights the local model update parameters according to the loss reputation value to increase the weight of the high-precision local parameters in the global parameters so as to reduce the number of model training times and get the converged global optimal model faster. Finally, comparative experiments were conducted on the MNIST dataset, Fashion MNIST dataset, and CIFAR-10 dataset for three variables: training rounds, one iteration time, and total training time. It was concluded that the FC-BCFL method has been further optimized and improved in terms of model global accuracy, training rounds, and training time. This shows that the model can learn efficiently while ensuring data privacy, and a model with higher accuracy can be obtained with relatively fewer training rounds and iteration time. This verifies the effectiveness of the proposed FC-BCFL method.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129529"},"PeriodicalIF":5.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143355501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-01-31DOI: 10.1016/j.neucom.2025.129511
Shan Tang , Wenhua Qian , Peng Liu , Jinde Cao
{"title":"Art creator: Steering styles in diffusion model","authors":"Shan Tang , Wenhua Qian , Peng Liu , Jinde Cao","doi":"10.1016/j.neucom.2025.129511","DOIUrl":"10.1016/j.neucom.2025.129511","url":null,"abstract":"<div><div>Large-scale text-to-image (T2I) generative models are extensively used in the art and creative industries because of their remarkable capability in generating high-quality images. The generation of ideal images in a single attempt is nearly impossible, necessitating complex and precise post-image editing. However, stylization pose significant challenges in post-editing. In this context, we introduce the Art Creator, which facilitates style controls based on a simple description or a single image. Art Creator enables nuanced image style edits, alterations in painting materials, colors, and brushstrokes, and understanding of high-level attributes such as object shapes. Furthermore, we manually annotated and released a dataset named ChinArt, comprising over 20,000 eastern artworks, aiming to address the gap in the global art creation domain. We showcase the quality and efficiency of our method across various art style creations.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129511"},"PeriodicalIF":5.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-01-30DOI: 10.1016/j.neucom.2025.129405
S. Ghosh , E.S. Baranowski , M. Biehl , W. Arlt , P. Tiňo , K. Bunte
{"title":"Interpretable modelling and visualization of biomedical data","authors":"S. Ghosh , E.S. Baranowski , M. Biehl , W. Arlt , P. Tiňo , K. Bunte","doi":"10.1016/j.neucom.2025.129405","DOIUrl":"10.1016/j.neucom.2025.129405","url":null,"abstract":"<div><div>Applications of interpretable machine learning (ML) techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of ML techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging over the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). The models and strategies we introduce address the challenges of real-world medical data, while remaining computationally inexpensive and transparent. Moreover, they exhibit similar or superior in performance compared to alternative techniques.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"626 ","pages":"Article 129405"},"PeriodicalIF":5.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-01-30DOI: 10.1016/j.neucom.2025.129577
Sanjeet S. Patil , Manojkumar Ramteke , Anurag S. Rathore
{"title":"Permutation invariant self-attention infused U-shaped transformer for medical image segmentation","authors":"Sanjeet S. Patil , Manojkumar Ramteke , Anurag S. Rathore","doi":"10.1016/j.neucom.2025.129577","DOIUrl":"10.1016/j.neucom.2025.129577","url":null,"abstract":"<div><div>The size and shape of organs in the human body vary according to factors like genetics, body size, proportions, health, lifestyle, gender, ethnicity, and race. Further, abnormalities due to cancer and chronic diseases also affect the size of organs and tumors. Moreover, the spatial location and area of these organs deviates along the transverse plane (Z plane) of the medical scans. Therefore, the generalizability and robustness of a computer vision framework over medical images can be improved if the framework is also encouraged to learn representations of the target areas regardless of their spatial location in input images. Hence, we propose a novel permutation invariant multi-headed self-attention (PISA) module to reduce a U-shaped transformer-based architecture Swin-UNet’s sensitivity towards permutation. We have infused this module in the skip connection of our architecture. We have achieved a mean dice score of 79.25 on the segmentations of 8 abdominal organs, better than most state-of-the-art algorithms. Moreover, we have analyzed the generalizability of our architecture over publicly available multi-sequence cardiac MRI datasets. When tested over a sequence unseen by the model during training, 25.1 % and 9.0 % improvement in dice scores were observed in comparison to the pure-CNN-based algorithm and pure transformer-based architecture, respectively, thereby demonstrating its versatility. Replacing the Self Attention module in a U-shaped transformer architecture with our Permutation Invariant Self Attention module produced noteworthy segmentations over shuffled test images, even though the module was trained solely on normal images. The results demonstrate the enhanced efficiency of the proposed module in imparting attention to target organs irrespective of their spatial positions.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129577"},"PeriodicalIF":5.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-01-30DOI: 10.1016/j.neucom.2025.129578
Khawaja Iftekhar Rashid, Chenhui Yang
{"title":"ViT-CAPS: Vision transformer with contrastive adaptive prompt segmentation","authors":"Khawaja Iftekhar Rashid, Chenhui Yang","doi":"10.1016/j.neucom.2025.129578","DOIUrl":"10.1016/j.neucom.2025.129578","url":null,"abstract":"<div><div>Real-time segmentation plays an important role in numerous applications, including autonomous driving and medical imaging, where accurate and instantaneous segmentation influences essential decisions. The previous approaches suffer from the lack of cross-domain transferability and the need for large amounts of labeled data that prevent them from being applied successfully to real-world scenarios. This study presents a new model, ViT-CAPS, that utilizes Vision Transformers in the encoder to improve segmentation performance in challenging and large-scale scenes. We employ the Adaptive Context Embedding (ACE) module, incorporating contrastive learning to improve domain adaptation by matching features from support and query images. Also, the Meta Prompt Generator (MPG) is designed to generate prompts from aligned features, and it can segment in complicated environments without requiring much human input. ViT-CAPS has shown promising results in resolving domain shift problems and improving few-shot segmentation in dynamic low-annotation settings. We conducted extensive experiments on four well-known datasets, FSS-1000, Cityscapes, ISIC, and DeepGlobe, and achieved noteworthy performance. We achieved a performance gain of 4.6 % on FSS-1000, 4.2 % on DeepGlobe, 6.1 % on Cityscapes, and a slight difference of −3 % on the ISIC dataset compared to previous approaches. We achieved an average mean IoU of 60.52 and 69.3, which is 2.7 % and 5.1 % higher accuracy over state-of-the-art Cross-Domain Few-Shot Segmentation (CD-FSS) models on 1-shot and 5-shot settings respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129578"},"PeriodicalIF":5.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143152645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}