NeurocomputingPub Date : 2024-08-23DOI: 10.1016/j.neucom.2024.128424
{"title":"Differentially private and explainable boosting machine with enhanced utility","authors":"","doi":"10.1016/j.neucom.2024.128424","DOIUrl":"10.1016/j.neucom.2024.128424","url":null,"abstract":"<div><p>In this paper, we introduce DP-EBM*, an enhanced utility version of the Differentially Private Explainable Boosting Machine (DP-EBM). DP-EBM* offers predictions for both classification and regression tasks, providing inherent explanations for its predictions while ensuring the protection of sensitive individual information via Differential Privacy. DP-EBM* has two major improvements over DP-EBM. Firstly, we develop an error measure to assess the efficiency of using privacy budget, a crucial factor to accuracy, and optimize this measure. Secondly, we propose a feature pruning method, which eliminates less important features during the training process. Our experimental results demonstrate that DP-EBM* outperforms the state-of-the-art differentially private explainable models.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128445
{"title":"GNN-based multi-source domain prototype representation for cross-subject EEG emotion recognition","authors":"","doi":"10.1016/j.neucom.2024.128445","DOIUrl":"10.1016/j.neucom.2024.128445","url":null,"abstract":"<div><p>Emotion recognition based on electroencephalography (EEG) signals is a major area of affective computing. However, the existence of distributional differences between subjects has greatly hindered the large-scale application of EEG emotion recognition techniques. Most of the existing cross-subject methods primarily concentrate on treating multiple subjects as a single source domain. These methods lead to significant distributional differences within the source domain, which hinder the model’s ability to generalise effectively to target subjects. In this paper, we propose a new method that combines graph neural network-based prototype representation of multiple source domains with clustering similarity loss. It consists of three parts: multi-source domain prototype representation, graph neural network and loss. Multi-source domain prototype representation treats different subjects in the source domain as sub-source domains and extracts prototype features, which learns a more fine-grained feature representation. Graph neural network can better model the association properties between prototypes and samples. In addition, we propose a similarity loss based on clustering idea. The loss makes maximum use of similarity between samples in the target domain while ensuring that the classification performance does not degrade. We conduct extensive experiments on two benchmark datasets, SEED and SEED IV. The experimental results validate the effectiveness of the proposed multi-source domain fusion approach and indicate its superiority over existing methods in cross-subject classification tasks.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128419
{"title":"Overcoming language priors in visual question answering with cumulative learning strategy","authors":"","doi":"10.1016/j.neucom.2024.128419","DOIUrl":"10.1016/j.neucom.2024.128419","url":null,"abstract":"<div><p>The performance of visual question answering(VQA) has witnessed great progress over the last few years. However, many current VQA models tend to rely on superficial linguistic correlations between questions and answers, often failing to sufficiently learn multi-modal knowledge from both vision and language, and thus suffering significant performance drops. To address this issue, the VQA-CP v2.0 dataset was developed to reduce language biases by greedily re-partitioning the distribution of VQA v2.0’s training and test sets. According to the fact that achieving high performance on real-world datasets requires effective learning from minor classes, in this paper we analyze the presence of skewed long-tail distributions in the VQA-CP v2.0 dataset and propose a new ensemble-based parameter-insensitive framework. This framework is built on two representation learning branches and a joint learning block, which are designed to reduce language biases in VQA tasks. Specifically, the representation learning branches can ensure the superior representative ability learned from the major and minor classes. The joint learning block forces the model to initially concentrate on major classes for robust representation and then gradually shifts its focus towards minor classes for classification during the training progress. Experimental results demonstrate that our approach outperforms the state-of-the-art works on the VQA-CP v2.0 dataset without requiring additional annotations. Notably, on the “num” type, our framework exceeds the second-best method (without extra annotations) by 8.64%. Meanwhile, our approach does not sacrifice accuracy performance on the VQA v2.0 dataset compared with the baseline model.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128437
{"title":"From text to mask: Localizing entities using the attention of text-to-image diffusion models","authors":"","doi":"10.1016/j.neucom.2024.128437","DOIUrl":"10.1016/j.neucom.2024.128437","url":null,"abstract":"<div><p>Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. This work proposes a simple but effective method to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without additional training time nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called “personalized referring image segmentation” with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128396
{"title":"Clusterwise Independent Component Analysis (C-ICA): An R package for clustering subjects based on ICA patterns underlying three-way (brain) data","authors":"","doi":"10.1016/j.neucom.2024.128396","DOIUrl":"10.1016/j.neucom.2024.128396","url":null,"abstract":"<div><p>In many areas of science, like neuroscience, genomics and text mining, several important and challenging research questions imply the study of (subject) heterogeneity present in three-way data. In clinical neuroscience, for example, disclosing differences or heterogeneity between subjects in resting state networks (RSNs) underlying multi-subject fMRI data (i.e., time by voxel by subject three-way data) may advance the subtyping of psychiatric and mental diseases. Recently, the Clusterwise Independent Component Analysis (C-ICA) method was proposed that enables the disclosure of heterogeneity between subjects in RSNs that is present in multi-subject rs-fMRI data <span><span>[1]</span></span>. Up to now, however, no publicly available software exists that allows to fit C-ICA to empirical data at hand. The goal of this paper, therefore, is to present the <span>CICA R</span> package, which contains the necessary functions to estimate the C-ICA parameters and to interpret and visualize the analysis output. Further, the package also includes functions to select suitable initial values for the C-ICA model parameters and to determine the optimal number of clusters and components for a given empirical data set (i.e., model selection). The use of the main functions of the package is discussed and demonstrated with simulated data. Herewith, the necessary analytical choices that have to be made by the user (e.g., starting values) are explained and showed step by step. The rich functionality of the package is further illustrated by applying C-ICA to empirical rs-fMRI data from a group of Alzheimer patients and elderly control subjects and to multi-country stock market data. Finally, extensions regarding the C-ICA algorithm and procedures for model selection that could be implemented in future releases of the package are discussed.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224011676/pdfft?md5=3a56afa04cb1c782ebc52543f39bdb32&pid=1-s2.0-S0925231224011676-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142088284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128430
{"title":"Semantic dependency and local convolution for enhancing naturalness and tone in text-to-speech synthesis","authors":"","doi":"10.1016/j.neucom.2024.128430","DOIUrl":"10.1016/j.neucom.2024.128430","url":null,"abstract":"<div><p>Self-attention-based networks have become increasingly popular due to their exceptional performance in parallel training and global context modeling. However, it may fall short of capturing local dependencies, particularly in datasets with strong local correlations. To address this challenge, we propose a novel method that utilizes semantic dependency to extract linguistic information from the original text. The semantic relationship between nodes serves as prior knowledge to refine the self-attention distribution. Additionally, to better fuse local contextual information, we introduce a one-dimensional convolution neural network to generate the query and value matrices in the self-attention mechanism, taking advantage of the strong correlation between input characters. We apply this variant of the self-attention network to text-to-speech tasks and propose a non-autoregressive neural text-to-speech model. To enhance pronunciation accuracy, we separate tones from phonemes as independent features in model training. Experimental results show that our model yields good performance in speech synthesis. Specifically, the proposed method significantly improves the processing of pause, stress, and intonation in speech.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128458
{"title":"A deep top-down framework towards generalisable multi-view pedestrian detection","authors":"","doi":"10.1016/j.neucom.2024.128458","DOIUrl":"10.1016/j.neucom.2024.128458","url":null,"abstract":"<div><p>Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128390
{"title":"A general method for mode decomposition on additive mixture: Generalized Variational Mode Decomposition and its sequentialization","authors":"","doi":"10.1016/j.neucom.2024.128390","DOIUrl":"10.1016/j.neucom.2024.128390","url":null,"abstract":"<div><p>Variational Mode Decomposition(VMD) method was proposed to separate non-stationary signal mixture by solving a optimization problem. This method is powerful and can reconstruct the signal components precisely when they are orthogonal(or quasi-orthogonal) in frequency domain. The crucial problem for VMD is that it requires the information of modal number before the decomposition. Also its applications are limited in 1D and 2D signal processing fields, of narrow scope.</p><p>In this paper, by inheriting and developing the core idea of VMD, we build a general form for this method and extend it to the modal decomposition for common additive mixture, not only limited in signal processing. To overcome the obstacle of modal number, we sequentialize the generalized VMD method, such that the modes can be extracted one by one, without knowing the modal number a priori. After the generalization and sequentialization for the VMD, we apply them in different fields of additive case, such as texture segmentation, Gaussian Mixture Model(GMM), clustering, etc. From the experiments, we conclude that the generalized and sequentialized VMD methods can solve variety classical problems from the view of modal decomposition, which implies that our methods have higher generality and wider applicability. A raw Matlab code for this algorithm is shown in <span><span>https://github.com/changwangke/SGVMD_additive_Clustering/blob/main/SGVMD_clustering.m</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128429
{"title":"Self-organizing hypercomplex-valued adaptive network","authors":"","doi":"10.1016/j.neucom.2024.128429","DOIUrl":"10.1016/j.neucom.2024.128429","url":null,"abstract":"<div><p>A novel, unsupervised, artificial intelligence system is presented, whose input signals and trainable weights consist of complex or hypercomplex values. The system uses the effect given by the complex multiplication that the multiplicand is not only scaled but also rotated. The more similar an input signal and the reference signal are, the more likely the input signal belongs to the corresponding class. The data assigned to a class during training is stored on a generic layer as well as on a layer extracting special features of the signal. As a result, the same cluster can hold a general description and the details of the signal. This property is vital for assigning a signal to an existing or a new class. To ensure that only valid new classes are opened, the system determines the variances by comparing each input signal component with the weights and adaptively adjusts its activation and threshold functions for an optimal classification decision. The presented system knows at any time all boundaries of its clusters. Experimentally, it is demonstrated that the system is able to cluster the data of multiple classes autonomously, fast, and with high accuracy.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224012001/pdfft?md5=c4ac73d840489a544f0af38bdb8b25c0&pid=1-s2.0-S0925231224012001-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-08-22DOI: 10.1016/j.neucom.2024.128443
{"title":"Memory-efficient DRASiW Models","authors":"","doi":"10.1016/j.neucom.2024.128443","DOIUrl":"10.1016/j.neucom.2024.128443","url":null,"abstract":"<div><p>Weightless Neural Networks (WNN) are ideal for Federated Learning due to their robustness and computational efficiency. These scenarios require models with a small memory footprint and the ability to aggregate knowledge from multiple models. In this work, we demonstrate the effectiveness of using Bloom filter variations to implement DRASiW models—an adaptation of WNN that records both the presence and frequency of patterns—with minimized memory usage. Across various datasets, DRASiW models show competitive performance compared to models like Random Forest, <span><math><mi>k</mi></math></span>-Nearest Neighbors, Multi-layer Perceptron, and Support Vector Machines, with an acceptable space trade-off. Furthermore, our findings indicate that Bloom filter variations, such as Count Min Sketch, can reduce the memory footprint of DRASiW models by up to 27% while maintaining performance and enabling distributed and federated learning strategies.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}