Ali Muhammad Shaikh, Yun-bo Zhao, Aakash Kumar, Munawar Ali, Yu Kang
{"title":"Efficient Bayesian CNN Model Compression using Bayes by Backprop and L1-Norm Regularization","authors":"Ali Muhammad Shaikh, Yun-bo Zhao, Aakash Kumar, Munawar Ali, Yu Kang","doi":"10.1007/s11063-024-11593-1","DOIUrl":"https://doi.org/10.1007/s11063-024-11593-1","url":null,"abstract":"<p>The swift advancement of convolutional neural networks (CNNs) in numerous real-world utilizations urges an elevation in computational cost along with the size of the model. In this context, many researchers steered their focus to eradicate these specific issues by compressing the original CNN models by pruning weights and filters, respectively. As filter pruning has an upper hand over the weight pruning method because filter pruning methods don’t impact sparse connectivity patterns. In this work, we suggested a Bayesian Convolutional Neural Network (BayesCNN) with Variational Inference, which prefaces probability distribution over weights. For the pruning task of Bayesian CNN, we utilized a combined version of L1-norm with capped L1-norm to help epitomize the amount of information that can be extracted through filter and control regularization. In this formation, we pruned unimportant filters directly without any test accuracy loss and achieved a slimmer model with comparative accuracy. The whole process of pruning is iterative and to validate the performance of our proposed work, we utilized several different CNN architectures on the standard classification dataset available. We have compared our results with non-Bayesian CNN models particularly, datasets such as CIFAR-10 on VGG-16, and pruned 75.8% parameters with float-point-operations (FLOPs) reduction of 51.3% without loss of accuracy and has achieved advancement in state-of-art.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"61 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization","authors":"","doi":"10.1007/s11063-024-11588-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11588-y","url":null,"abstract":"<h3>Abstract</h3> <p>The most extensively used tools for categorizing complicated networks are community detection methods. One of the most common methods for unsupervised and semi-supervised clustering is community detection based on Non-negative Matrix Factorization (NMF). Nonetheless, this approach encounters multiple challenges, including the lack of specificity for the data type and the decreased efficiency when errors occur in each cluster’s knowledge priority. As modularity is the basic and thorough criterion for evaluating and validating performance of community detection methods, this paper proposes a new approach for modularity-based community detection which is similar to symmetric NMF. The provided approach is a semi-supervised adaptive robust community detection model referred to as modularized robust semi-supervised adaptive symmetric NMF (MRASNMF). In this model, the modularity criterion has been successfully combined with the NMF model via a novel multi-view clustering method. Also, the tuning parameter is adjusted iteratively via an adaptive method. MRASNMF makes use of knowledge priority, modularity criterion, reinforcement of non-negative matrix factorization, and has iterative solution, as well. In this regard, the MRASNMF model was evaluated and validated using five real-world networks in comparison to existing semi-supervised community detection approaches. According to the findings of this study, the proposed strategy is most effective for all types of networks.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"239 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection and Classification of Brain Tumor Using Convolution Extreme Gradient Boosting Model and an Enhanced Salp Swarm Optimization","authors":"J. Jebastine","doi":"10.1007/s11063-024-11590-4","DOIUrl":"https://doi.org/10.1007/s11063-024-11590-4","url":null,"abstract":"<p>Some types of tumors in people with brain cancer grow so rapidly that their average size doubles in twenty-five days. Precisely determining the type of tumor enables physicians to conduct clinical planning and estimate dosage. However, accurate classification remains a challenging task due to the variable shape, size, and location of the tumors.The major objective of this paper is to detect and classify brain tumors. This paper introduces an effective Convolution Extreme Gradient Boosting model based on enhanced Salp Swarm Optimization (CEXGB-ESSO) for detecting brain tumors, and their types. Initially, the MRI image is fed to bilateral filtering for the purpose of noise removal. Then, the de-noised image is fed to the CEXGB model, where Extreme Gradient Boosting (EXGB) is used, replacing a fully connected layer of CNN to detect and classify brain tumors. It consists of numerous stacked convolutional neural networks (CNN) for efficient automatic learning of features, which avoids overfitting and time-consuming processes. Then, the tumor type is predicted using the EXGB in the last layer, where there is no need to bring the weight values from the fully connected layer. Enhanced Salp Swarm Optimization (ESSO) is utilized to find the optimal hyperparameters of EXGB, which enhance convergence speed and accuracy. Our proposed CEXGB-ESSO model gives high performance in terms of accuracy (99), sensitivity (97.52), precision (98.2), and specificity (97.7).Also, the convergence analysis reveals the efficient optimization process of ESSO, obtaining optimal hyperparameter values around iteration 25. Furthermore, the classification results showcase the CEXGB-ESSO model’s capability to accurately detect and classify brain tumors.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"2014 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Gait Recognition Based on Frontal-View Walking Sequences Using Multi-modal Feature Representations and Learning","authors":"","doi":"10.1007/s11063-024-11554-8","DOIUrl":"https://doi.org/10.1007/s11063-024-11554-8","url":null,"abstract":"<h3>Abstract</h3> <p>Despite that much progress has been reported in gait recognition, most of these existing works adopt lateral-view parameters as gait features, which requires large area of data collection environment and limits the applications of gait recognition in real-world practice. In this paper, we adopt frontal-view walking sequences rather than lateral-view sequences and propose a new gait recognition method based on multi-modal feature representations and learning. Specifically, we characterize walking sequences with two different kinds of frontal-view gait features representations, including holistic silhouette and dense optical flow. Pedestrian regions extraction is achieved by an improved YOLOv7 algorithm called Gait-YOLO algorithm to eliminate the effects of background interference. Multi-modal fusion module (MFM) is proposed to explore the intrinsic connections between silhouette and dense optical flow features by using squeeze and excitation operations at the channel and spatial levels. Gait feature encoder is further used to extract global walking characteristics, enabling efficient multi-modal information fusion. To validate the efficacy of the proposed method, we conduct experiments on CASIA-B and OUMVLP gait databases and compare performance of our proposed method with other existing state-of-the-art gait recognition methods.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"50 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Enhanced Attention for Image Captioning","authors":"","doi":"10.1007/s11063-024-11527-x","DOIUrl":"https://doi.org/10.1007/s11063-024-11527-x","url":null,"abstract":"<h3>Abstract</h3> <p>Image captioning, which involves automatically generating textual descriptions based on the content of images, has garnered increasing attention from researchers. Recently, Transformers have emerged as the preferred choice for the language model in image captioning models. Transformers leverage self-attention mechanisms to address gradient accumulation issues and eliminate the risk of gradient explosion commonly associated with RNN networks. However, a challenge arises when the input features of the self-attention mechanism belong to different categories, as it may result in ineffective highlighting of important features. To address this issue, our paper proposes a novel attention mechanism called Self-Enhanced Attention (SEA), which replaces the self-attention mechanism in the decoder part of the Transformer model. In our proposed SEA, after generating the attention weight matrix, it further adjusts the matrix based on its own distribution to effectively highlight important features. To evaluate the effectiveness of SEA, we conducted experiments on the COCO dataset, comparing the results with different visual models and training strategies. The experimental results demonstrate that when using SEA, the CIDEr score is significantly higher compared to the scores obtained without using SEA. This indicates the successful addressing of the challenge of effectively highlighting important features with our proposed mechanism.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"34 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking","authors":"Wattanapong Suttapak, Jianfu Zhang, Haohuo Zhao, Liqing Zhang","doi":"10.1007/s11063-024-11592-2","DOIUrl":"https://doi.org/10.1007/s11063-024-11592-2","url":null,"abstract":"<p>Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Self-Supervised Attributed Graph Clustering for Social Network Analysis","authors":"Hu Lu, Haotian Hong, Xia Geng","doi":"10.1007/s11063-024-11596-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11596-y","url":null,"abstract":"<p>Deep graph clustering is an unsupervised learning task that divides nodes in a graph into disjoint regions with the help of graph auto-encoders. Currently, such methods have several problems, as follows. (1) The deep graph clustering method does not effectively utilize the generated pseudo-labels, resulting in sub-optimal model training results. (2) Each cluster has a different confidence level, which affects the reliability of the pseudo-label. To address these problems, we propose a Deep Self-supervised Attribute Graph Clustering model (DSAGC) to fully leverage the information of the data itself. We divide the proposed model into two parts: an upstream model and a downstream model. In the upstream model, we use the pseudo-label information generated by spectral clustering to form a new high-confidence distribution with which to optimize the model for a higher performance. We also propose a new reliable sample selection mechanism to obtain more reliable samples for downstream tasks. In the downstream model, we only use the reliable samples and the pseudo-label for the semi-supervised classification task without the true label. We compare the proposed method with 17 related methods on four publicly available citation network datasets, and the proposed method generally outperforms most existing methods in three performance metrics. By conducting a large number of ablative experiments, we validate the effectiveness of the proposed method.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Optimization in Wireless Sensor Network Using VLSI Technique on FPGA Platform","authors":"Saranya Leelakrishnan, Arvind Chakrapani","doi":"10.1007/s11063-024-11495-2","DOIUrl":"https://doi.org/10.1007/s11063-024-11495-2","url":null,"abstract":"<p>Nowadays, the demand for high-performance wireless sensor networks (WSN) is increasing, and its power requirement has threatened the survival of WSN. The routing methods cannot optimize power consumption. To improve the power consumption, VLSI based power optimization technology is proposed in this article. Different elements in WSN, such as sensor nodes, modulation schemes, and package data transmission, influence energy usage. Following a WSN power study, it was discovered that lowering the energy usage of sensor networks is critical in WSN. In this manuscript, a power optimization model for wireless sensor networks (POM-WSN) is proposed. The proposed system shows how to build and execute a power-saving strategy for WSNs using a customized collaborative unit with parallel processing capabilities on FPGA (Field Programmable Gate Array) and a smart power component. The customizable cooperation unit focuses on applying specialized hardware to customize Operating System speed and transfer it to a soft intel core. This device decreases the OS (Operating System) central processing unit (CPU) overhead associated with installing processor-based IoT (Internet of Things) devices. The smart power unit controls the soft CPU’s clock and physical peripherals, putting them in the right state depending on the hardware requirements of the program (tasks) being executed. Furthermore, by taking the command signal from a collaborative custom unit, it is necessary to adjust the amplitude and current. The efficiency and energy usage of the FPGA-based energy saver approach for sensor nodes are compared to the energy usage of processor-based WSN nodes implementations. Using FPGA programmable architecture, the research seeks to build effective power-saving approaches for WSNs.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"42 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140323771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstruction-Aware Kernelized Fuzzy Clustering Framework Incorporating Local Information for Image Segmentation","authors":"Chengmao Wu, Xiao Qi","doi":"10.1007/s11063-024-11450-1","DOIUrl":"https://doi.org/10.1007/s11063-024-11450-1","url":null,"abstract":"<p>Kernelized fuzzy C-means clustering with weighted local information is an extensively applied robust segmentation algorithm for noisy image. However, it is difficult to effectively solve the problem of segmenting image polluted by strong noise. To address this issue, a reconstruction-aware kernel fuzzy C-mean clustering with rich local information is proposed in this paper. Firstly, the optimization modeling of guided bilateral filtering is given for noisy image; Secondly, this filtering model is embedded into kernelized fuzzy C-means clustering with local information, and a novel reconstruction-filtering information driven fuzzy clustering model for noise-corrupted image segmentation is presented; Finally, a tri-level alternative and iterative algorithm is derived from optimizing model using optimization theory and its convergence is strictly analyzed. Many Experimental results on noisy synthetic images and actual images indicate that compared with the latest advanced fuzzy clustering-related algorithms, the algorithm presented in this paper has better segmentation performance and stronger robustness to noise, and its PSNR and ACC values increase by about 0.16–3.28 and 0.01–0.08 respectively.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"7 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multipath Attention and Adaptive Gating Network for Video Action Recognition","authors":"Haiping Zhang, Zepeng Hu, Dongjin Yu, Liming Guan, Xu Liu, Conghao Ma","doi":"10.1007/s11063-024-11591-3","DOIUrl":"https://doi.org/10.1007/s11063-024-11591-3","url":null,"abstract":"<p>3D CNN networks can model existing large action recognition datasets well in temporal modeling and have made extremely great progress in the field of RGB-based video action recognition. However, the previous 3D CNN models also face many troubles. For video feature extraction convolutional kernels are often designed and fixed in each layer of the network, which may not be suitable for the diversity of data in action recognition tasks. In this paper, a new model called <i>Multipath Attention and Adaptive Gating Network</i> (MAAGN) is proposed. The core idea of MAAGN is to use the <i>spatial difference module</i> (SDM) and the <i>multi-angle temporal attention module</i> (MTAM) in parallel at each layer of the multipath network to obtain spatial and temporal features, respectively, and then dynamically fuses the spatial-temporal features by the <i>adaptive gating module</i> (AGM). SDM explores the action video spatial domain using difference operators based on the attention mechanism, while MTAM tends to explore the action video temporal domain in terms of both global timing and local timing. AGM is built on an adaptive gate unit, the value of which is determined by the input of each layer, and it is unique in each layer, dynamically fusing the spatial and temporal features in the paths of each layer in the multipath network. We construct the temporal network MAAGN, which has a competitive or better performance than state-of-the-art methods in video action recognition, and we provide exhaustive experiments on several large datasets to demonstrate the effectiveness of our approach.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"27 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}