NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128710
{"title":"Ranking-based adaptive query generation for DETRs in crowded pedestrian detection","authors":"","doi":"10.1016/j.neucom.2024.128710","DOIUrl":"10.1016/j.neucom.2024.128710","url":null,"abstract":"<div><div>Variants of DEtection TRansformer (DETRs) have shown promising performance in crowded pedestrian detection. However, we observe that DETRs are sensitive to the hyper-parameter (the number of queries). Adjusting this hyper-parameter is crucial for achieving competitive performance across different crowded pedestrian datasets. Existing query generation methods are limited to generate a fixed number of queries based on this hyper-parameter, which often leads to missed detections and incorrect detections due to the varied number and density of pedestrians in crowded scenes. To address this challenge, we propose an adaptive query generation method called Ranking-based Adaptive Query Generation (RAQG). RAQG comprises three components: a ranking prediction head, a query supplementer, and Soft Gradient L1 Loss (SGL1). Specifically, we leverage the ranking of the lowest confidence score positive training sample to generate queries adaptively. The ranking prediction head predicts this ranking, which guides our query generation. Additionally, to refine the query generation process, we introduce a query supplementer that adjusts the number of queries based on the predicted ranking. Furthermore, we introduce SGL1, a novel loss function for training the ranking prediction head over a wide regression range. Our method is designed to be lightweight and universal, suitable for integration into any DETRs framework for crowded pedestrian detection. Experimental results on Crowdhuman and Citypersons datasets demonstrate that our RAQG method can generate queries adaptively and achieves competitive results. Notably, our approach achieves a state-of-the-art 39.4% MR on Crowdhuman.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128698
{"title":"A lightweight video anomaly detection model with weak supervision and adaptive instance selection","authors":"","doi":"10.1016/j.neucom.2024.128698","DOIUrl":"10.1016/j.neucom.2024.128698","url":null,"abstract":"<div><div>Video anomaly detection is to determine whether there are any abnormal events, behaviors or objects in a given video, which enables effective and intelligent public safety management. As video anomaly labeling is both time-consuming and expensive, most existing works employ unsupervised or weakly supervised learning methods. This paper focuses on weakly supervised video anomaly detection, in which the training videos are labeled whether or not they contain any anomalies, but lack information about the specific frames and quantities of anomalies. However, the uncertainty of weakly labeled data and the large model size prevent existing methods from wide deployment in real scenarios, especially the resource-limit situations such as edge-computing. In this paper, we develop a lightweight video anomaly detection model. On the one hand, we propose an adaptive instance selection strategy, which is based on the model’s current status to select confident instances, thereby mitigating the uncertainty of weakly labeled data and subsequently promoting the model’s performance. On the other hand, we design a lightweight multi-level temporal correlation attention module and an hourglass-shaped fully connected layer to construct the model, which can reduce the model parameters to only 0.56% of the existing methods (e.g. RTFM). Extensive experiments on three public datasets UCF-Crime, ShanghaiTech and XD-Violence show that our model performs better than or equally to the existing lightweight methods, while with a significantly reduced number of model parameters. Furthermore, by integrating the improved module designed in this paper with the VadCLIP method proposed by Wu et al., we achieve the state-of-the-art performance of non-lightweight models on the UCF-Crime and XD-Violence datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128664
{"title":"Modality-specific adaptive scaling and attention network for cross-modal retrieval","authors":"","doi":"10.1016/j.neucom.2024.128664","DOIUrl":"10.1016/j.neucom.2024.128664","url":null,"abstract":"<div><div>There are huge differences in data distribution and feature representation of different modalities. How to flexibly and accurately retrieve data from different modalities is a challenging problem. The mainstream common subspace methods only focus on the heterogeneity gap, and use a unified method to jointly learn the common representation of different modalities, which can easily lead to the difficulty of multi-modal unified fitting. In this work, we innovatively propose the concept of multi-modal information density discrepancy, and propose a modality-specific adaptive scaling method incorporating prior knowledge, which can adaptively learn the most suitable network for different modalities. Secondly, for the problem of efficient semantic fusion and interference features, we propose a multi-level modal feature attention mechanism, which realizes the efficient fusion of text semantics through attention mechanism, explicitly captures and shields the interference features from multiple scales. In addition, to address the bottleneck of cross-modal retrieval task caused by the insufficient quality of multimodal common subspace and the defects of Transformer structure, this paper proposes a cross-level interaction injection mechanism to fuse multi-level patch interactions without affecting the pre-trained model to construct higher quality latent representation spaces and multimodal common subspaces. Comprehensive experimental results on four widely used cross-modal retrieval datasets show the proposed MASAN achieves the state-of-the-art results and significantly outperforms other existing methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128669
{"title":"Hybrid lane change strategy of autonomous vehicles based on SOAR cognitive architecture and deep reinforcement learning","authors":"","doi":"10.1016/j.neucom.2024.128669","DOIUrl":"10.1016/j.neucom.2024.128669","url":null,"abstract":"<div><div>Research on lane change strategies for autonomous vehicles holds paramount importance in optimizing traffic flow efficiency, enhancing driving safety, and adapting to complex traffic environments. While numerous rule-based or machine-learning approaches have been explored to tackle the challenge of lane change on highways, they frequently exhibit limited performance owing to the complexity of driving environments. This study proposes a novel lane change strategy for autonomous vehicles, which utilizes a hybrid framework integrating the SOAR cognitive architecture and deep reinforcement learning (DRL) to address the lane change challenge on highways. First, we introduce a rule extraction algorithm, the RuleCOSI+, which is based on tree ensemble algorithms, designed to extract concise lane change rules from large-scale human driving data. These straightforward rules, together with traffic regulations and safety rules, constitute the long-term memory of the SOAR cognitive architecture, enabling transparent decision-making processes. Next, by analyzing the clipping mechanism of the proximal policy optimization (PPO) algorithm, we propose an Adaptive Clipping PPO (ACPPO) algorithm which is based on the importance of samples. This algorithm adopts different clipping strategies for SOAR samples and ACPPO samples during the training process, enabling the algorithm to more effectively utilize samples with different levels of importance. Then, we propose a hybrid decision-making algorithm: SOAR-ACPPO, which combines the SOAR cognitive architecture with the ACPPO algorithm. This algorithm leverages SOAR’s prior knowledge to effectively and safely guide agent learning. Finally, by selecting appropriate intervention probability and weaning strategy, the system avoids inappropriate knowledge intervention and ensures adequate environment exploration. Simulation experiments conducted using the CARLA simulator illustrate that the proposed strategy not only improves model learning efficiency but also enhances driving efficiency and safety. Additionally, it demonstrates a certain degree of human-like characteristics and interpretability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128703
{"title":"Robust and stochastic sparse subspace clustering","authors":"","doi":"10.1016/j.neucom.2024.128703","DOIUrl":"10.1016/j.neucom.2024.128703","url":null,"abstract":"<div><div>Sparse subspace clustering (SSC) has been widely employed in machine learning and pattern recognition, but it still faces scalability challenges when dealing with large-scale datasets. Recently, stochastic SSC (SSSC) has emerged as an effective solution by leveraging the dropout technique. However, SSSC cannot robustly handle noise, especially non-Gaussian noise, leading to unsatisfactory clustering performance. To address the above issues, we propose a novel robust and stochastic method called stochastic sparse subspace clustering with the Huber function (S3CH). The key idea is to introduce the Huber surrogate to measure the loss of the stochastic self-expression framework, thus S3CH inherits the advantage of the stochastic framework for large-scale problems while mitigating sensitivity to non-Gaussian noise. In algorithms, an efficient proximal alternating minimization (PAM)-based optimization scheme is developed. In theory, the convergence of the generated sequence is rigorously proved. Extensive numerical experiments on synthetic and six real datasets validate the advantages of the proposed method in clustering accuracy, noise robustness, parameter sensitivity, post-hoc analysis, and model stability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128675
{"title":"An architectural analysis of DeepOnet and a general extension of the physics-informed DeepOnet model on solving nonlinear parametric partial differential equations","authors":"","doi":"10.1016/j.neucom.2024.128675","DOIUrl":"10.1016/j.neucom.2024.128675","url":null,"abstract":"<div><div>The Deep Neural Operator, as proposed by Lu et al. (2021), marks a considerable advancement in solving parametric partial differential equations. This paper examines the DeepOnet model’s neural network design, focusing on the effectiveness of its trunk-branch structure in operator learning tasks. Three key advantages of the trunk-branch structure are identified: the global learning strategy, the independent operation of the trunk and branch networks, and the consistent representation of solutions. These features are especially beneficial for operator learning. Building upon these findings, we have evolved the traditional DeepOnet into a more general form from a network perspective, allowing a nonlinear interfere of the branch net on the trunk net than the linear combination limited by the conventional DeepOnet. The operator model also incorporates physical information for enhanced integration. In a series of experiments tackling partial differential equations, the extended DeepOnet consistently outperforms than the traditional DeepOnet, particularly in complex problems. Notably, the extended DeepOnet model shows substantial advancements in operator learning with nonlinear parametric partial differential equations and exhibits a remarkable capacity for reducing physics loss.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-05DOI: 10.1016/j.neucom.2024.128707
{"title":"A novel label-aware global graph construction method and spiking-coded graph neural network for intelligent process fault diagnosis","authors":"","doi":"10.1016/j.neucom.2024.128707","DOIUrl":"10.1016/j.neucom.2024.128707","url":null,"abstract":"<div><div>Fault diagnosis plays a crucial role in ensuring the safety and efficiency of industrial processes. However, traditional techniques often face difficulties in handling large-scale data characterized by complex structures and relationships. To efficiently represent industrial data as graphs and develop a low-energy-cost feature extraction model, a novel label-aware global graph construction method and a spiking graph convolutional network (SGCN) are proposed in this study to achieve intelligent process fault diagnosis. The label-aware method enhances graph data representation by capturing intrinsic correlations and global features. The SGCN integrates graph convolutional layers with spiking encoding, enabling effective feature extraction while offering computational efficiency advantages. A weighted loss function is introduced to mitigate data imbalance issues. Experiments on the Tennessee Eastman process, the Three-phase Flow Facility, and the de-propanizer distillation process demonstrate SGCN’s superior performance over baseline models in various fault scenarios, while significantly reducing computational costs. The proposed method offers promising potential for reliable and efficient fault diagnosis in complex real-world industrial environments.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-04DOI: 10.1016/j.neucom.2024.128674
{"title":"A novel deep learning algorithm for Phaeocystis counting and density estimation based on feature reconstruction and multispectral generator","authors":"","doi":"10.1016/j.neucom.2024.128674","DOIUrl":"10.1016/j.neucom.2024.128674","url":null,"abstract":"<div><div>Phaeocystis proliferation is a primary instigator of algal blooms, commonly known as red tides, posing a significant threat to marine life and severely disrupting marine ecosystems. Currently, no effective method exists for estimating Phaeocystis density, underscoring an urgent need for preventative measures against Phaeocystis blooms. Given the challenges associated with the varying sizes and frequent overlapping of Phaeocystis colonies, we propose an innovative counting algorithm that leverages feature reconstruction and multispectral generator modules. Utilizing deep learning, our method achieves accurately real-time density estimation and prediction of Phaeocystis colonies. The algorithm operates in two stages: first, a multispectral reconstruction block is trained to function as a multispectral generator; second, spectral and spatial features are integrated to predict density and perform counting. Our approach surpasses existing algorithms in accuracy for Phaeocystis counting and demonstrates the utility of multispectral data in enhancing the neural network’s ability to discern targets from their background.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-04DOI: 10.1016/j.neucom.2024.128706
{"title":"Graphical model for mixed data types","authors":"","doi":"10.1016/j.neucom.2024.128706","DOIUrl":"10.1016/j.neucom.2024.128706","url":null,"abstract":"<div><div>With the development of data collection technologies, data types have become more diverse. Additionally, graphical models, as tools for describing variable network relationships, have become increasingly popular in recent years. Previous studies have focused on graphical models tailored to specific types of data. However, these existing methods fail to identify graphical models for mixed data types. The difficulty of constructing graphical models for mixed data types lies in the fact that each type of data has its own space, which challenges the estimation of network relationships in a graphical model when the data are combined. To address this issue, this study presents a novel method that utilizes a vectorization and alignment strategy developed particularly for mixed data types, including scalar, interval-valued, compositional, and functional data, to estimate a graphical model. By iteratively employing a block-sparse graphical lasso method on aligned data, the method can achieve satisfactory results, as shown by numerous simulation experiments. The results also validate the superiority of our proposed method over potential competing methods. Furthermore, this method was applied to an engine damage propagation network as an illustrative example. Our method provides a novel modeling approach for graphical models in the case of mixed data types.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-04DOI: 10.1016/j.neucom.2024.128658
{"title":"QSIM: A Quantum-inspired hierarchical semantic interaction model for text classification","authors":"","doi":"10.1016/j.neucom.2024.128658","DOIUrl":"10.1016/j.neucom.2024.128658","url":null,"abstract":"<div><div>Semantic interaction modeling is a fundamental technology in natural language understanding that guides models to extract deep semantic information from text. Currently, the attention mechanism is one of the most effective techniques in semantic interaction modeling, which learns word-level attention representation by measuring the relevance between different words. However, the attention mechanism is limited to word-level semantic interaction, it cannot meet the needs of fine-grained interactive information for some text classification tasks. In recent years, quantum-inspired language modeling methods have successfully constructed quantized representations of language systems in Hilbert spaces, which use density matrices to achieve fine-grained semantic interaction modeling.</div><div>This paper presents a <strong>Q</strong>uantum-inspired hierarchical <strong>S</strong>emantic <strong>I</strong>nteraction <strong>M</strong>odel (<strong>QSIM</strong>), which follows the sememe-word-sentence language construction principle and utilizes quantum entanglement theory to capture hierarchical semantic interaction information in Hilbert space. Our work builds on the idea of the attention mechanism and extends it. Specifically, we explore the original semantic space from a quantum theory perspective and derive the core semantic space using the Schmidt decomposition technique, where: (1) Sememe is represented as the unit vector in the two-dimensional minimum semantic space; (2) Word is represented as reduced density matrices in the core semantic space, where Schmidt coefficients quantify sememe-level semantic interaction. Compared to density matrices, reduced density matrices capture fine-grained semantic interaction information with lower computational cost; (3) Sentence is represented as quantum superposition states of words, and the degree of word-level semantic interaction is measured using entanglement entropy.</div><div>To evaluate the model’s performance, we conducted experiments on 15 text classification datasets. The experimental results demonstrate that our model is superior to classical neural network models and traditional quantum-inspired language models. Furthermore, the experiment also confirms two distinct advantages of QISM: (1) <strong>flexibility</strong>, as it can be integrated into various mainstream neural network text classification architectures; and (2) <strong>practicability</strong>, as it alleviates the problem of parameter growth inherent in density matrix calculation in quantum language model.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}