MedViA：通过视觉增强和多模态融合增强医疗时间序列分类能力

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-09-20 DOI:10.1016/j.inffus.2025.103659

Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li

{"title":"MedViA：通过视觉增强和多模态融合增强医疗时间序列分类能力","authors":"Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li","doi":"10.1016/j.inffus.2025.103659","DOIUrl":null,"url":null,"abstract":"<div><div>The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose <em>MedViA</em>, a novel multimodal learning framework that empowers <em>Med</em>ical time series classification by integrating both <em>Vi</em>sion <em>A</em>ugmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a <em>Visual Perception Module</em>, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a <em>Numeric Perception Module</em>, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a <em>Medically-informed Hierarchical Multimodal Fusion</em> strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103659"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MedViA: Empowering medical time series classification with vision augmentation and multimodal fusion\",\"authors\":\"Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li\",\"doi\":\"10.1016/j.inffus.2025.103659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose <em>MedViA</em>, a novel multimodal learning framework that empowers <em>Med</em>ical time series classification by integrating both <em>Vi</em>sion <em>A</em>ugmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a <em>Visual Perception Module</em>, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a <em>Numeric Perception Module</em>, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a <em>Medically-informed Hierarchical Multimodal Fusion</em> strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103659\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525007316\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007316","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

医学时间序列的分析，如心电图（ECG）和脑电图（EEG），是临床诊断和患者监测的基础。这些信号的准确和自动分类可以促进疾病的早期发现和个性化治疗，从而改善患者的预后。虽然深度学习模型被广泛采用，但它们主要是将信号作为顺序数值数据处理。这种单一模式的方法往往错过了临床医生从图形图表中容易识别的整体视觉模式，并且难以模拟生理数据的复杂非线性动态。因此，视觉表征中包含的丰富诊断线索在很大程度上仍未得到开发，从而限制了模型的性能。为了解决这些限制，我们提出了MedViA，这是一个新的多模态学习框架，通过集成视觉增强和数字感知来增强医学时间序列分类。我们的核心创新是将原始医疗时间序列信号增强为视觉模态，实现双路径架构，在计算上模仿临床专家的综合推理。通过增强，MedViA具有两个并行感知分支：视觉感知模块，建立在新型多分辨率差分视觉变压器上，处理增强图像以捕获高级结构模式和诊断关键波形形态。同时，一个数字感知模块，它使用我们提出的时间Kolmogorov网络直接从原始时间序列中建模细粒度和非线性动态。为了协同整合来自这些专用路径的见解，我们引入了一种医学信息分层多模式融合策略，该策略使用后期融合架构和分层优化目标来导出最终分类。我们在多个公共医疗时间序列数据集上进行了广泛的实验，与最先进的方法相比，我们的方法具有优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MedViA: Empowering medical time series classification with vision augmentation and multimodal fusion

The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose MedViA, a novel multimodal learning framework that empowers Medical time series classification by integrating both Vision Augmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a Visual Perception Module, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a Numeric Perception Module, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a Medically-informed Hierarchical Multimodal Fusion strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.