Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li
{"title":"MedViA:通过视觉增强和多模态融合增强医疗时间序列分类能力","authors":"Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li","doi":"10.1016/j.inffus.2025.103659","DOIUrl":null,"url":null,"abstract":"<div><div>The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose <em>MedViA</em>, a novel multimodal learning framework that empowers <em>Med</em>ical time series classification by integrating both <em>Vi</em>sion <em>A</em>ugmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a <em>Visual Perception Module</em>, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a <em>Numeric Perception Module</em>, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a <em>Medically-informed Hierarchical Multimodal Fusion</em> strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103659"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MedViA: Empowering medical time series classification with vision augmentation and multimodal fusion\",\"authors\":\"Wei Fan , Jingru Fei , Jindong Han , Jie Lian , Hangting Ye , Xiaozhuang Song , Xin Lv , Kun Yi , Min Li\",\"doi\":\"10.1016/j.inffus.2025.103659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose <em>MedViA</em>, a novel multimodal learning framework that empowers <em>Med</em>ical time series classification by integrating both <em>Vi</em>sion <em>A</em>ugmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a <em>Visual Perception Module</em>, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a <em>Numeric Perception Module</em>, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a <em>Medically-informed Hierarchical Multimodal Fusion</em> strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103659\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525007316\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007316","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
MedViA: Empowering medical time series classification with vision augmentation and multimodal fusion
The analysis of medical time series, such as Electrocardiography (ECG) and Electroencephalography (EEG), is fundamental to clinical diagnostics and patient monitoring. Accurate and automated classification of these signals can facilitate early disease detection and personalized treatment, thereby improving patient outcomes. Although deep learning models are widely adopted, they mainly process signals as sequential numerical data. Such a single-modality approach often misses the holistic visual patterns easily recognized by clinicians from graphical charts and struggles to model the complex non-linear dynamics of physiological data. As a result, the rich diagnostic cues contained in visual representations remain largely untapped, limiting model performance. To address these limitations, we propose MedViA, a novel multimodal learning framework that empowers Medical time series classification by integrating both Vision Augmentation and numeric perception. Our core innovation is to augment the raw medical time series signals into the visual modality, enabling a dual-pathway architecture that computationally mimics the comprehensive reasoning of clinical experts. With the augmentation, MedViA then features two parallel perception branches: a Visual Perception Module, built upon a novel Multi-resolution Differential Vision Transformer, processes the augmented images to capture high-level structural patterns and diagnostically critical waveform morphologies. Concurrently, a Numeric Perception Module, which uses our proposed Temporal Kolmogorov Network to model fine-grained and non-linear dynamics directly from the raw time series. To synergistically integrate the insights from these dedicated pathways, we introduce a Medically-informed Hierarchical Multimodal Fusion strategy, which uses a late-fusion architecture and a hierarchical optimization objective to derive for the final classification. We have conducted extensive experiments on multiple public medical time series datasets, which demonstrate the superior performance of our method compared to state-of-the-art approaches.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.