Stefanos Gkikas, N. Tachos, Stelios Andreadis, V. Pezoulas, D. Zaridis, George Gkois, Anastasia Matonaki, Thanos G. Stavropoulos, Dimitrios I. Fotiadis
{"title":"Multimodal automatic assessment of acute pain through facial videos and heart rate signals utilizing transformer-based architectures","authors":"Stefanos Gkikas, N. Tachos, Stelios Andreadis, V. Pezoulas, D. Zaridis, George Gkois, Anastasia Matonaki, Thanos G. Stavropoulos, Dimitrios I. Fotiadis","doi":"10.3389/fpain.2024.1372814","DOIUrl":null,"url":null,"abstract":"Accurate and objective pain evaluation is crucial in developing effective pain management protocols, aiming to alleviate distress and prevent patients from experiencing decreased functionality. A multimodal automatic assessment framework for acute pain utilizing video and heart rate signals is introduced in this study. The proposed framework comprises four pivotal modules: the Spatial Module, responsible for extracting embeddings from videos; the Heart Rate Encoder, tasked with mapping heart rate signals into a higher dimensional space; the AugmNet, designed to create learning-based augmentations in the latent space; and the Temporal Module, which utilizes the extracted video and heart rate embeddings for the final assessment. The Spatial-Module undergoes pre-training on a two-stage strategy: first, with a face recognition objective learning universal facial features, and second, with an emotion recognition objective in a multitask learning approach, enabling the extraction of high-quality embeddings for the automatic pain assessment. Experiments with the facial videos and heart rate extracted from electrocardiograms of the BioVid database, along with a direct comparison to 29 studies, demonstrate state-of-the-art performances in unimodal and multimodal settings, maintaining high efficiency. Within the multimodal context, 82.74% and 39.77% accuracy were achieved for the binary and multi-level pain classification task, respectively, utilizing 9.62 million parameters for the entire framework.","PeriodicalId":12641,"journal":{"name":"Frontiers in Pain Research","volume":"39 26","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Pain Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fpain.2024.1372814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and objective pain evaluation is crucial in developing effective pain management protocols, aiming to alleviate distress and prevent patients from experiencing decreased functionality. A multimodal automatic assessment framework for acute pain utilizing video and heart rate signals is introduced in this study. The proposed framework comprises four pivotal modules: the Spatial Module, responsible for extracting embeddings from videos; the Heart Rate Encoder, tasked with mapping heart rate signals into a higher dimensional space; the AugmNet, designed to create learning-based augmentations in the latent space; and the Temporal Module, which utilizes the extracted video and heart rate embeddings for the final assessment. The Spatial-Module undergoes pre-training on a two-stage strategy: first, with a face recognition objective learning universal facial features, and second, with an emotion recognition objective in a multitask learning approach, enabling the extraction of high-quality embeddings for the automatic pain assessment. Experiments with the facial videos and heart rate extracted from electrocardiograms of the BioVid database, along with a direct comparison to 29 studies, demonstrate state-of-the-art performances in unimodal and multimodal settings, maintaining high efficiency. Within the multimodal context, 82.74% and 39.77% accuracy were achieved for the binary and multi-level pain classification task, respectively, utilizing 9.62 million parameters for the entire framework.