Information FusionPub Date : 2024-10-28DOI: 10.1016/j.inffus.2024.102758
{"title":"Alignable kernel network","authors":"","doi":"10.1016/j.inffus.2024.102758","DOIUrl":"10.1016/j.inffus.2024.102758","url":null,"abstract":"<div><div>To enhance the adaptability and performance of Convolutional Neural Networks (CNN), we present an adaptable mechanism called Alignable Kernel (AliK) unit, which dynamically adjusts the receptive field (RF) dimensions of a model in response to varying stimuli. The branches of AliK unit are integrated through a novel align transformation softmax attention, incorporating prior knowledge through rank ordering constraints. The attention weightings across the branches establish the effective RF scales, leveraged by neurons in the fusion layer. This mechanism is inspired by neuroscientific observations indicating that the RF dimensions of neurons in the visual cortex vary with the stimulus, a feature often overlooked in CNN architectures. By aggregating successive AliK ensembles, we develop a deep network architecture named the Alignable Kernel Network (AliKNet). AliKNet with interdisciplinary design improves the network’s performance and interpretability by taking direct inspiration from the structure and function of human neural systems, especially the visual cortex. Empirical evaluations in the domains of image classification and semantic segmentation have demonstrated that AliKNet excels over numerous state-of-the-art architectures, achieving this without increasing model complexity. Furthermore, we demonstrate that AliKNet can identify target objects across various scales, confirming their ability to dynamically adapt their RF sizes in response to the input data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-28DOI: 10.1016/j.inffus.2024.102757
{"title":"Divide and augment: Supervised domain adaptation via sample-wise feature fusion","authors":"","doi":"10.1016/j.inffus.2024.102757","DOIUrl":"10.1016/j.inffus.2024.102757","url":null,"abstract":"<div><div>The training of deep models relies on appropriate regularization from a copious amount of labeled data. And yet, obtaining a large and well-annotated dataset is costly. Thus, supervised domain adaptation (SDA) becomes attractive, especially when it aims to regularize these networks for a data-scarce target domain by exploiting an available data-rich source domain. Different from previous methods focusing on an cumbersome adversarial learning manner, we assume that a source or target sample in the feature space can be regarded as a combination of (1) domain-oriented features (i.e., those reflecting the difference among domains) and (2) class-specific features (i.e., those inherently defining a specific class). By exploiting this, we present Divide and Augment (DivAug), a feature fusion-based data augmentation framework that performs target domain augmentation by transforming source samples into the target domain in an energy-efficient manner. Specifically, with a novel <em>semantic inconsistency loss</em> based on a multi-task ensemble learning scheme, DivAug enforces two encoders to learn the decomposed domain-oriented and class-specific features, respectively. Furthermore, we propose a simple sample-wise feature fusion rule that transforms source samples into target domain by combining class-specific features from a source sample and domain-oriented features from a target sample. Extensive experiments demonstrate that our method outperforms the current state-of-the-art methods across various datasets in SDA settings.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-24DOI: 10.1016/j.inffus.2024.102750
{"title":"IFNet: Data-driven multisensor estimate fusion with unknown correlation in sensor measurement noises","authors":"","doi":"10.1016/j.inffus.2024.102750","DOIUrl":"10.1016/j.inffus.2024.102750","url":null,"abstract":"<div><div>In recent years, multisensor fusion for state estimation has gained considerable attention. The effectiveness of the optimal fusion estimation method heavily relies on the correlation among sensor measurement noises. To enhance estimate fusion performance by mining unknown correlation in the data, this paper introduces a novel multisensor fusion approach using an information filtering neural network (IFNet) for discrete-time nonlinear state space models with cross-correlated measurement noises. The method presents three notable advantages: First, it offers a data-driven perspective to tackle uncertain correlation in multisensor estimate fusion while preserving the interpretability of the information filtering. Second, by harnessing the RNN’s capability to manage data streams, it can dynamically update the fusion weights between sensors to improve fusion accuracy. Third, this method has a lower complexity than the state-of-the-art KalmanNet measurement fusion method when dealing with the fusion problem involving a large number of sensors. Numerical simulations demonstrate that IFNet exhibits better fusion accuracy than traditional filtering methods and KalmanNet fusion filtering when correlation among measurement noises is unknown.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-23DOI: 10.1016/j.inffus.2024.102749
{"title":"Efficient audio–visual information fusion using encoding pace synchronization for Audio–Visual Speech Separation","authors":"","doi":"10.1016/j.inffus.2024.102749","DOIUrl":"10.1016/j.inffus.2024.102749","url":null,"abstract":"<div><div>Contemporary audio–visual speech separation (AVSS) models typically use encoders that merge audio and visual representations by concatenating them at a specific layer. This approach assumes that both modalities progress at the same pace and that information is adequately encoded at the chosen fusion layer. However, this assumption is often flawed due to inherent differences between the audio and visual modalities. In particular, the audio modality, being more directly tied to the final output (i.e., denoised speech), tends to converge faster than the visual modality. This discrepancy creates a persistent challenge in selecting the appropriate layer for fusion. To address this, we propose the Encoding Pace Synchronization Network (EPS-Net) for AVSS. EPS-Net allows for the independent encoding of the two modalities, enabling each to be processed at its own pace. At the same time, it establishes communication between the audio and visual modalities at corresponding encoding layers, progressively synchronizing their encoding speeds. This approach facilitates the gradual fusion of information while preserving the unique characteristics of each modality. The effectiveness of the proposed method has been validated through extensive experiments on the LRS2, LRS3, and VoxCeleb2 datasets, demonstrating superior performance over state-of-the-art methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-23DOI: 10.1016/j.inffus.2024.102752
{"title":"DSEM-NeRF: Multimodal feature fusion and global–local attention for enhanced 3D scene reconstruction","authors":"","doi":"10.1016/j.inffus.2024.102752","DOIUrl":"10.1016/j.inffus.2024.102752","url":null,"abstract":"<div><div>3D scene understanding often faces the problems of insufficient detail capture and poor adaptability to multi-view changes. To this end, we proposed a NeRF-based 3D scene understanding model DSEM-NeRF, which effectively improves the reconstruction quality of complex scenes through multimodal feature fusion and global–local attention mechanism. DSEM-NeRF extracts multimodal features such as color, depth, and semantics from multi-view 2D images, and accurately captures key areas by dynamically adjusting the importance of features. Experimental results show that DSEM-NeRF outperforms many existing models on the LLFF and DTU datasets, with PSNR reaching 20.01, 23.56, and 24.58 respectively, and SSIM reaching 0.834. In particular, it shows strong robustness in complex scenes and multi-view changes, verifying the effectiveness and reliability of the model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-22DOI: 10.1016/j.inffus.2024.102769
{"title":"Hypergraph convolutional networks with multi-ordering relations for cross-document event coreference resolution","authors":"","doi":"10.1016/j.inffus.2024.102769","DOIUrl":"10.1016/j.inffus.2024.102769","url":null,"abstract":"<div><div>Recognizing the coreference relationship between different event mentions in the text (i.e., event coreference resolution) is an important task in natural language processing. It helps to understand the association between various events in the text, and plays an important role in information extraction, question answering systems, and reading comprehension. Existing research has made progress in improving the performance of event coreference resolution, but there are also some shortcomings. For example, most of the existing methods analyze the event data in the document in a serial processing mode, without considering the complex relationship between events, and it is difficult to mine the deep semantics of events. To solve these problems, this paper proposes a cross-document event co-reference resolution method (HGCN-ECR) based on hypergraph convolutional neural networks. Firstly, the BiLSTM-CRF model was used to label the semantic role of the events extracted from a number of documents. According to the labeling results, the trigger words and non-trigger words of the event were determined, and the multi-document event hypergraph was constructed around the event trigger words. Then hypergraph convolutional neural networks are used to learn higher-order semantic information in multi-document event hypergraphs, and multi-head attention mechanisms are introduced to understand the hidden features of different event relationship types by treating each event relationship as a set of separate attention mechanisms. Finally, the feed-forward neural network and the average link clustering method are used to calculate the coreference score of events and complete the coreference event clustering, and the cross-document event coreference resolution is realized. The experimental results show that the cross-document event co-reference resolution method is superior to the baseline model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-22DOI: 10.1016/j.inffus.2024.102747
{"title":"Multimodal dual perception fusion framework for multimodal affective analysis","authors":"","doi":"10.1016/j.inffus.2024.102747","DOIUrl":"10.1016/j.inffus.2024.102747","url":null,"abstract":"<div><div>The misuse of social platforms and the difficulty in regulating post contents have culminated in a surge of negative sentiments, sarcasms, and the rampant spread of fake news. In response, Multimodal sentiment analysis, sarcasm detection and fake news detection based on image and text have attracted considerable attention recently. Due to that these areas share semantic and sentiment features and confront related fusion challenges in deciphering complex human expressions across different modalities, integrating these multimodal classification tasks that share commonalities across different scenarios into a unified framework is expected to simplify research in sentiment analysis, and enhance the effectiveness of classification tasks involving both semantic and sentiment modeling. Therefore, we consider integral components of a broader spectrum of research known as multimodal affective analysis towards semantics and sentiment, and propose a novel multimodal dual perception fusion framework (MDPF). Specifically, MDPF contains three core procedures: (1) Generating bootstrapping language-image Knowledge to enrich origin modality space, and utilizing cross-modal contrastive learning for aligning text and image modalities to understand underlying semantics and interactions. (2) Designing dynamic connective mechanism to adaptively match image-text pairs and jointly employing gaussian-weighted distribution to intensify semantic sequences. (3) Constructing a cross-modal graph to preserve the structured information of both image and text data and share information between modalities, while introducing sentiment knowledge to refine the edge weights of the graph to capture cross-modal sentiment interaction. We evaluate MDPF on three publicly available datasets across three tasks, and the empirical results demonstrate the superiority of our proposed model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-21DOI: 10.1016/j.inffus.2024.102748
{"title":"Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection","authors":"","doi":"10.1016/j.inffus.2024.102748","DOIUrl":"10.1016/j.inffus.2024.102748","url":null,"abstract":"<div><div>Code Language Models (codeLMs) and Graph Neural Networks (GNNs) are widely used in code vulnerability detection. However, a critical yet often overlooked issue is that GNNs primarily rely on aggregating information from adjacent nodes, limiting structural information transfer to single-layer updates. In code graphs, nodes and relationships typically require cross-layer information propagation to fully capture complex program logic and potential vulnerability patterns. Furthermore, while some studies utilize codeLMs to supplement GNNs with code semantic information, existing integration methods have not fully explored the potential of their collaborative effects.</div><div>To address these challenges, we introduce Vul-LMGNNs that integrates pre-trained CodeLMs with GNNs, leveraging knowledge distillation to facilitate cross-layer propagation of both code semantic knowledge and structural information. Specifically, Vul-LMGNNs utilizes Code Property Graphs (CPGs) to incorporate code syntax, control flow, and data dependencies, while employing gated GNNs to extract structural information in the CPG. To achieve cross-layer information transmission, we implement an online knowledge distillation (KD) program that enables a single student GNN to acquire structural information extracted from a simultaneously trained counterpart through an alternating training procedure. Additionally, we leverage pre-trained CodeLMs to extract semantic features from code sequences. Finally, we propose an ”implicit-explicit” joint training framework to better leverage the strengths of both CodeLMs and GNNs. In the implicit phase, we utilize CodeLMs to initialize the node embeddings of each student GNN. Through online knowledge distillation, we facilitate the propagation of both code semantics and structural information across layers. In the explicit phase, we perform linear interpolation between the CodeLM and the distilled GNN to learn a late fusion model. The proposed method, evaluated across four real-world vulnerability datasets, demonstrated superior performance compared to 17 state-of-the-art approaches. Our source code can be accessed via GitHub: <span><span>https://github.com/Vul-LMGNN/vul-LMGGNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142531903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-20DOI: 10.1016/j.inffus.2024.102743
{"title":"Dynamic clustering-based consensus model for large-scale group decision-making considering overlapping communities","authors":"","doi":"10.1016/j.inffus.2024.102743","DOIUrl":"10.1016/j.inffus.2024.102743","url":null,"abstract":"<div><div>Consensus-reaching strategy is crucial in large-scale group decision-making (LSGDM) as it serves as an effective approach to reducing group conflicts. Meanwhile, the common social network relationships in large groups can affect information exchange, thereby influencing the consensus-reaching process (CRP) and decision results. Therefore, how to leverage social network information in LSGDM to obtain an agreed solution has received widespread attention. However, most existing research assumes relative independence between communities in the dimension reduction process of LSGDM and neglects the possibility of different overlaps between them. Moreover, the impact of overlapping communities on CRP has not been adequately explored. Besides, the dynamic variations in clusters and their weights caused by evaluation updates need to be further studied. To address these issues, this paper proposes a dynamic clustering-based consensus-reaching method for LSGDM considering the impact of overlapping communities. First, the LINE-based label propagation algorithm is designed to cluster decision makers (DMs) and detect overlapping communities with social network information. An overlapping community-driven feedback mechanism is then developed to enhance group consensus by utilizing the bridging role of overlapping DMs. During CRP, clusters and their weights are dynamically updated with trust evolution due to the evaluation iteration. Finally, a case study using the Film Trust dataset is conducted to verify the effectiveness of the proposed method. Simulation experiments and comparative analysis demonstrate the capability of our method in modeling practical scenarios and addressing LSGDM problems under social network contexts.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142531904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2024-10-19DOI: 10.1016/j.inffus.2024.102742
{"title":"Applications of knowledge distillation in remote sensing: A survey","authors":"","doi":"10.1016/j.inffus.2024.102742","DOIUrl":"10.1016/j.inffus.2024.102742","url":null,"abstract":"<div><div>With the ever-growing complexity of models in the field of remote sensing (RS), there is an increasing demand for solutions that balance model accuracy with computational efficiency. Knowledge distillation (KD) has emerged as a powerful tool to meet this need, enabling the transfer of knowledge from large, complex models to smaller, more efficient ones without significant loss in performance. This review article provides an extensive examination of KD and its innovative applications in RS. KD, a technique developed to transfer knowledge from a complex, often cumbersome model (teacher) to a more compact and efficient model (student), has seen significant evolution and application across various domains. Initially, we introduce the fundamental concepts and historical progression of KD methods. The advantages of employing KD are highlighted, particularly in terms of model compression, enhanced computational efficiency, and improved performance, which are pivotal for practical deployments in RS scenarios. The article provides a comprehensive taxonomy of KD techniques, where each category is critically analyzed to demonstrate the breadth and depth of the alternative options, and illustrates specific case studies that showcase the practical implementation of KD methods in RS tasks, such as instance segmentation and object detection. Further, the review discusses the challenges and limitations of KD in RS, including practical constraints and prospective future directions, providing a comprehensive overview for researchers and practitioners in the field of RS. Through this organization, the paper not only elucidates the current state of research in KD but also sets the stage for future research opportunities, thereby contributing significantly to both academic research and real-world applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}