{"title":"Generalized multisensor wearable signal fusion for emotion recognition from noisy and incomplete data","authors":"Vamsi Kumar Naidu Pallapothula , Sidharth Anand , Sreyasee Das Bhattacharjee, Junsong Yuan","doi":"10.1016/j.smhl.2025.100571","DOIUrl":null,"url":null,"abstract":"<div><div>Continual real-time monitoring of users’ health via noninvasive wearable devices (e.g., smartwatch, smartphone) demonstrates significant potential to enhance human well-being in everyday life. However, due to respective sampling rates, noise sensitivity, and data types, the inherent heterogeneity of the signals received from multiple sensors make the task of biosignal-based emotion recognition both complex and time-consuming. While how to optimally fuse multimode information (where each sensor produces a unique mode-specific input signal) to ensure a reliable inference performance remains difficult, the particular challenges in this problem setting is primarily threefold: (1) The data availability is limited due to several unique person/device-specific properties and high cost of labeling; (2) The acquired signals from wearable devices are often noisy or may as well be lossy due to users’ personal lifestyle choices or environmental interferences; (3) Due to several intra-individual and inter-individual signal variabilities, enabling model generalizability is always difficult. To this end, we propose a general-purpose multisensor fusion network, <em>GM-FuseNet</em> that can seamlessly integrate and transform multi-sensor signal information for a variety of tasks. Unlike a majority of existing works, which rely on a fundamental assumption that full multi-mode query information is present during inference, <em>GM-FuseNet</em>’s first-level preface multimodal transformer module is explicitly designed to enhance both unimodal and multimodal performance in the presence of partial modality details. We also utilize an effective <em>multimodal temporal correlation loss</em> that aligns the unimode signals pairwise in the temporal domain and encourages the model to learn the temporal correlation across multiple sensor-specific signals. Extensive evaluation using two public datasets WESAD and CASE reports outperformance (<span><math><mrow><mn>1</mn><mtext>–</mtext><mn>4</mn><mtext>%</mtext></mrow></math></span>) of the proposed <em>GM-FuseNet</em> against state-of-the-art supervised or self-supervised models while delivering a consistently robust generalization all-across. Additionally, by reporting another <span><math><mrow><mn>2</mn><mtext>–</mtext><mn>4</mn><mtext>%</mtext></mrow></math></span> improved accuracy and F1-scores, <em>GM-FuseNet</em> also demonstrates a significant promise in handling a variety of test environments including the missing and noisy multisensor query signals.</div></div>","PeriodicalId":37151,"journal":{"name":"Smart Health","volume":"36 ","pages":"Article 100571"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart Health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352648325000327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Health Professions","Score":null,"Total":0}
引用次数: 0
Abstract
Continual real-time monitoring of users’ health via noninvasive wearable devices (e.g., smartwatch, smartphone) demonstrates significant potential to enhance human well-being in everyday life. However, due to respective sampling rates, noise sensitivity, and data types, the inherent heterogeneity of the signals received from multiple sensors make the task of biosignal-based emotion recognition both complex and time-consuming. While how to optimally fuse multimode information (where each sensor produces a unique mode-specific input signal) to ensure a reliable inference performance remains difficult, the particular challenges in this problem setting is primarily threefold: (1) The data availability is limited due to several unique person/device-specific properties and high cost of labeling; (2) The acquired signals from wearable devices are often noisy or may as well be lossy due to users’ personal lifestyle choices or environmental interferences; (3) Due to several intra-individual and inter-individual signal variabilities, enabling model generalizability is always difficult. To this end, we propose a general-purpose multisensor fusion network, GM-FuseNet that can seamlessly integrate and transform multi-sensor signal information for a variety of tasks. Unlike a majority of existing works, which rely on a fundamental assumption that full multi-mode query information is present during inference, GM-FuseNet’s first-level preface multimodal transformer module is explicitly designed to enhance both unimodal and multimodal performance in the presence of partial modality details. We also utilize an effective multimodal temporal correlation loss that aligns the unimode signals pairwise in the temporal domain and encourages the model to learn the temporal correlation across multiple sensor-specific signals. Extensive evaluation using two public datasets WESAD and CASE reports outperformance () of the proposed GM-FuseNet against state-of-the-art supervised or self-supervised models while delivering a consistently robust generalization all-across. Additionally, by reporting another improved accuracy and F1-scores, GM-FuseNet also demonstrates a significant promise in handling a variety of test environments including the missing and noisy multisensor query signals.