NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-03DOI: 10.1016/j.neucom.2026.132933
Sai Munikoti, Ian Stewart, Sameera Horawalavithana, Henry Kvinge, Tegan Emerson, Sandra Thompson, Karl Pazdernik
{"title":"Generalist multimodal AI: A review of architectures, challenges and opportunities","authors":"Sai Munikoti, Ian Stewart, Sameera Horawalavithana, Henry Kvinge, Tegan Emerson, Sandra Thompson, Karl Pazdernik","doi":"10.1016/j.neucom.2026.132933","DOIUrl":"10.1016/j.neucom.2026.132933","url":null,"abstract":"<div><div>Multimodal models are expected of be a critical component to future advances in artificial intelligence. This field is starting to grow rapidly with a surge of new design elements motivated by the success of foundation models in natural language processing (NLP) and vision. It is widely hoped that further extending the foundation models to multiple modalities (e.g., text, image, video, sensor, time series, graph, etc.) will ultimately lead to generalist multimodal models, i.e., one model across different data modalities and tasks. However, there is little research that systematically analyzes recent multimodal models (particularly the ones that work beyond text and vision) with respect to the underlying architecture proposed. Therefore, this work provides a fresh perspective on generalist multimodal models (GMMs) via a novel architecture and training configuration specific taxonomy. This includes factors such as <em>Unifiability</em>, <em>Modularity</em>, and <em>Adaptability</em> that are pertinent and essential to the wide adoption and application of GMMs. The review further highlights key challenges and prospects for the field and guides researchers into the new advancements.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 132933"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-12DOI: 10.1016/j.neucom.2026.133017
Zhiwei Hu , Víctor Gutiérrez-Basulto , Zhiliang Xiang , Ru Li , Jeff Z. Pan
{"title":"Leveraging intra-modal and inter-modal interaction for multi-modal entity alignment","authors":"Zhiwei Hu , Víctor Gutiérrez-Basulto , Zhiliang Xiang , Ru Li , Jeff Z. Pan","doi":"10.1016/j.neucom.2026.133017","DOIUrl":"10.1016/j.neucom.2026.133017","url":null,"abstract":"<div><div>Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs). Existing approaches focus on how to better encode and aggregate information from different modalities. However, it is not trivial to leverage multi-modal knowledge in entity alignment due to the modal heterogeneity. In this paper, we propose a <strong>M</strong>ulti-Grained <strong>I</strong>nteraction framework for <strong>M</strong>ulti-Modal <strong>E</strong>ntity <strong>A</strong>lignment (<strong>MIMEA</strong>), which effectively realizes multi-granular interaction within the same modality or between different modalities. MIMEA is composed of four modules: i) a <em>Multi-modal Knowledge Embedding</em> module, which extracts modality-specific representations with multiple individual encoders; ii) a <em>Probability-guided Modal Fusion</em> module, which employs a probability guided approach to integrate uni-modal representations into joint-modal embeddings, while considering the interaction between uni-modal representations; iii) an <em>Optimal Transport Modal Alignment</em> module, which introduces an optimal transport mechanism to encourage the interaction between uni-modal and joint-modal embeddings; iv) a <em>Modal-adaptive Contrastive Learning</em> module, which distinguishes the embeddings of equivalent entities from those of non-equivalent ones, for each modality. Extensive experiments conducted on two real-world datasets demonstrate the strong performance of MIMEA compared to the SoTA. Datasets and code are available at the following website: <span><span>https://github.com/zhiweihu1103/MEA-MIMEA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133017"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-13DOI: 10.1016/j.neucom.2026.133057
Yan Huang , Hongxin Fu , Zhonghang Li , Yongcan Luo , Tianyi Chen , Si Wu
{"title":"MVHDiff: Leveraging 3D priors for consistent multi-view human image generation with diffusion models","authors":"Yan Huang , Hongxin Fu , Zhonghang Li , Yongcan Luo , Tianyi Chen , Si Wu","doi":"10.1016/j.neucom.2026.133057","DOIUrl":"10.1016/j.neucom.2026.133057","url":null,"abstract":"<div><div>Text-to-image models are increasingly applied to human image generation, leveraging multimodal information under multiple conditions to produce high-quality human images. Despite their ability to generate detailed images, these models often struggle to maintain perceptual consistency across multiple viewpoints. To address this limitation, we propose Multi-View Human Diffusion (MVHDiff), a novel framework that integrates 3D human model priors and text prompts to generate high-quality, multi-view-consistent human images. MVHDiff separately acquires textual descriptions of human appearance and pose, as well as spatial information regarding the subject’s orientation relative to the camera. Subsequently, a perceptual fusion module is employed to align these text features with the visual features extracted from the human image, thereby enabling the fused learning of prior information and image features. Further, MVHDiff finetunes both appearance descriptions and spatial viewpoint-related textual inputs, enabling precise text-based control over human attributes while ensuring semantic consistency across different spatial viewpoints. Experimental results demonstrate that MVHDiff significantly outperforms existing methods in generating text-guided human attributes with consistent multi-view representations, offering a robust solution for high-quality, text-driven human image generation.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133057"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive gated diffusion model network for asymmetric modalities applied to cross-domain fault diagnosis","authors":"Mingqi Li, Lei Yin, Qibin Wang, Wangshu Gao, Dawei Chen, Liang Xiao, Dinglong Zheng","doi":"10.1016/j.neucom.2026.133835","DOIUrl":"10.1016/j.neucom.2026.133835","url":null,"abstract":"<div><div>Intelligent fault diagnosis technology based on multimodal signal fusion has demonstrated performance superior to that of single-modal approaches. However, in real-world industrial scenarios, target devices often fail to capture multimodal data as comprehensively as in laboratory settings due to cost or deployment constraints, resulting in missing modalities. Simultaneously, domain shifts caused by varying operating conditions further exacerbate the failure of models. To address this issue, this paper proposes an adaptive gated diffusion model network (AGDMN), in which the source domain contains labeled multimodal data, while the target domain has only unlabeled partial-modal data. AGDMN first performs feature extraction and disentanglement, separating shared and private features across modalities and domains to facilitate subsequent feature fusion and cross-modal generation. Then, a conditional diffusion model is trained on the source domain, using vibration features as conditions to learn the complex mapping for generating high-fidelity sound or current features. In the target domain, this model uses vibration features as input to reconstruct missing features. An adaptive gated fusion module is designed, which includes a task-aware scoring network that dynamically evaluates the contribution of each modality to the diagnostic task. Combined with a cross-modal attention mechanism, it generates adaptive gating weights to achieve robust weighted fusion of features. Experimental results on three cross-condition tasks demonstrate that the AGDMN model achieves superior performance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"691 ","pages":"Article 133835"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147827480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-11DOI: 10.1016/j.neucom.2026.132989
Wei Hou , Xianxing Liu , Linxiao Li , Chunling Fu
{"title":"Hierarchy-aware graph neural network and inverse-variance reinforcement learning for drug recommendation","authors":"Wei Hou , Xianxing Liu , Linxiao Li , Chunling Fu","doi":"10.1016/j.neucom.2026.132989","DOIUrl":"10.1016/j.neucom.2026.132989","url":null,"abstract":"<div><div>Drug recommendation (DR) based on artificial intelligence plays a crucial role in healthcare research, offering precise and effective drug prescription suggestions for doctors. However, existing methods typically model DR as a sequential task, overlooking the complex correlations among medical entities present in electronic medical records (EMRs). To this end, we propose a novel DR model that integrates a hierarchy-aware graph neural network (GNN) with inverse-variance (IV) reinforcement learning (RL). Specifically, we represent patient and drug information using a knowledge graph, and employ a hyperbolic space-embedded GNN to encode the hierarchical structure among graph nodes. We propose an IV-RL mechanism to reduce excessive exploration of the model on inefficient or noisy data. By incorporating IV into the RL framework, the model can more efficiently sample from the training data, thereby enhancing learning performance. Extensive experiments, on the widely-used MIMIC-III, MIMIC-IV, and eICU, datasets demonstrate that our proposed method achieves superior performance and exhibits reliable DR capabilities. We believe that our proposed method provides a promising solution for accurate and effective DR, and opens up new opportunities for further research.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 132989"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-13DOI: 10.1016/j.neucom.2026.133066
Xue Wu , Jingwei Xin , Jun Hao , Hui Gao , Jie Li , Nannan Wang , Xinbo Gao
{"title":"One-step diffusion-based real-world image super-resolution with visual perception distillation","authors":"Xue Wu , Jingwei Xin , Jun Hao , Hui Gao , Jie Li , Nannan Wang , Xinbo Gao","doi":"10.1016/j.neucom.2026.133066","DOIUrl":"10.1016/j.neucom.2026.133066","url":null,"abstract":"<div><div>Diffusion-based models have been widely used in various visual generation tasks, showing promising results in image super-resolution (SR), while typically being limited by dozens or even hundreds of sampling steps. Although existing methods aim to accelerate the inference speed of multi-step diffusion-based SR methods through knowledge distillation, their generated images exhibit insufficient semantic alignment with real images, resulting in suboptimal perceptual quality reconstruction, specifically reflected in the CLIPIQA score. These methods still face many challenges in perceptual quality and semantic fidelity. Based on the challenges, we propose VPD-SR, a novel visual perception diffusion distillation framework specifically designed for SR, aiming to construct an effective and efficient one-step SR model. Specifically, VPD-SR consists of two components: Explicit Semantic-aware Supervision (ESS) and High-Frequency Perception (HFP) loss. Firstly, the ESS leverages the powerful visual perceptual understanding capabilities of the CLIP model to extract explicit semantic supervision, thereby enhancing semantic consistency. Then, considering that high-frequency information contributes to the visual perception quality of images, in addition to the vanilla distillation loss, the HFP loss guides the student model to restore the missing high-frequency details in degraded images that are critical for enhancing perceptual quality. Lastly, we expand VPD-SR in an adversarial training manner to further enhance the authenticity of the generated content. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed VPD-SR achieves superior performance compared to both previous state-of-the-art methods and the teacher model with just one-step sampling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133066"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A systematic review of machine learning for digital stain processing in pathology","authors":"Rabiah Al-Qudah , Abubakar Bala , Mrouj Almuhajri , Khiati Zakaria , Ching Y. Suen","doi":"10.1016/j.neucom.2026.133064","DOIUrl":"10.1016/j.neucom.2026.133064","url":null,"abstract":"<div><div>Digital staining involves using methods such as Machine Learning (ML) to replace chemical staining in pathology. Staining adds contrast that makes cell details more visible under the microscope. However, chemical methods are slow, use toxic reagents, and require skilled personnel. In contrast, digital staining can generate images faster, reduce the need for reagents and specialized equipment, and minimize plastic and chemical waste, making the workflow more sustainable. This paper systematically reviews papers published on ML-based digital stain processing. We propose a new taxonomy that divides existing studies into five groups: stain normalization, stain augmentation, virtual staining, stain transformation, and hybrid approaches. In addition, we observed several trends from the reviewed papers. Finally, we outline open research directions.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133064"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-11DOI: 10.1016/j.neucom.2026.133026
Mikio Kofune , Kensei Monden , Suguru Yamanaka
{"title":"Enhancing accuracy and interpretability in corporate credit rating classification with the transformer-LSTM model","authors":"Mikio Kofune , Kensei Monden , Suguru Yamanaka","doi":"10.1016/j.neucom.2026.133026","DOIUrl":"10.1016/j.neucom.2026.133026","url":null,"abstract":"<div><div>Corporate credit rating classification is essential for assessing a corporation’s debt repayment ability. Previous research has demonstrated that neural network models exhibit high classification accuracy, particularly when incorporating time-series features. However, a significant challenge remains regarding their interpretability, often limited by nonlinear and intricate calculation processes. To address this trade-off between interpretability and the utilization of time-series features, we introduce a novel approach: the Transformer-Long Short-Term Memory (T-LSTM). Specifically, the attention matrix embedded within the T-LSTM architecture provides interpretability by revealing the temporal importance of features. Comparative experiments show that our T-LSTM model surpasses standard machine learning baselines in the long-history setting. Empirical results demonstrate that the proposed model, trained on a 10-year history of longitudinal financial ratios, yields an absolute accuracy improvement of approximately 1 percentage point when compared with a strong sequential baseline such as a Long Short-Term Memory (LSTM) model trained on the same 10-year history, and up to about 17 percentage points when compared with representative single-year “snapshot” baselines that use only the most recent year’s ratios. Furthermore, the attention matrix successfully visualizes specific time points where information is most critical for rating classification. Consequently, the proposed model offers a highly accurate and interpretable solution for credit rating classification in the financial industry.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133026"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-14DOI: 10.1016/j.neucom.2026.133012
Seongyeop Yang , Minho Kim , Byeongkeun Kang , Yeejin Lee
{"title":"XID: A protocol for evaluating identity consistency under domain shifts and reidentification method","authors":"Seongyeop Yang , Minho Kim , Byeongkeun Kang , Yeejin Lee","doi":"10.1016/j.neucom.2026.133012","DOIUrl":"10.1016/j.neucom.2026.133012","url":null,"abstract":"<div><div>Recent domain generalization methods for person reidentification aim to learn features that remain discriminative across domains to improve performance in unseen environments. Prior work has addressed domain shift through discrepancy reduction and alternative normalization strategies, while maintaining identity separability. However, these evaluations often rely on simplified settings with non-overlapping identities and limited visual diversity. To address this, we propose a new evaluation protocol that introduces identity transfer and significant appearance variation by constructing query and gallery sets from different domains. This setup enables a more realistic assessment of intra-class variation and inter-class discriminability. We further develop a learning framework specifically designed for this protocol, which enhances generalization by regulating achromatic information and projecting embeddings into a space that simulates unseen domains. The framework includes a self-regulating augmentation policy that adjusts transformation strength during training. Extensive experiments show consistent performance gains under both the proposed and standard protocols, establishing a more rigorous and practical benchmark.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133012"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2026-05-01Epub Date: 2026-02-14DOI: 10.1016/j.neucom.2026.133082
Jun Hu , Jie Wu , Xisheng Zhan , Tao Han , Huaicheng Yan
{"title":"Data-driven adaptive secure collision-free formation tracking of networked marine surface vehicles under DoS attacks","authors":"Jun Hu , Jie Wu , Xisheng Zhan , Tao Han , Huaicheng Yan","doi":"10.1016/j.neucom.2026.133082","DOIUrl":"10.1016/j.neucom.2026.133082","url":null,"abstract":"<div><div>This paper addresses the adaptive path planning and secure formation tracking (SFT) control problem for networked marine surface vehicles (NMSVs) subject to denial-of-service (DoS) attacks, model uncertainty, external disturbances, and actuator faults. A hierarchical adaptive formation planning and control (HAFPC) framework is proposed. In its path planning layer, reinforcement learning (RL) based greedy path map inference (GPMI) infers local maps in unknown environments, while a Euclidean-distance-field-based formation path planning (EDF-FPP) algorithm finds collision-free trajectories. In the network layer, a distributed resilient estimator is designed to accurately estimate the virtual leader information under DoS attacks in directed graphs. In the control layer, a neural network (NN)-based data-driven observer is first employed to address model uncertainty. Then, an adaptive offset function and a data-driven observer-based control (DDOBC) algorithm are adopted to achieve SFT, obstacle avoidance, and handle input saturation and actuator faults. Lyapunov stability theory establishes sufficient conditions for system convergence and stabilization. Numerical simulations validate the proposed framework’s effectiveness.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133082"},"PeriodicalIF":6.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}