FedCMD：用于驾驶员情绪识别的联合跨模态知识蒸馏器

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-01 DOI:10.1145/3650040

Saira Bano, Nicola Tonellotto, Pietro Cassarà, Alberto Gotta

{"title":"FedCMD：用于驾驶员情绪识别的联合跨模态知识蒸馏器","authors":"Saira Bano, Nicola Tonellotto, Pietro Cassarà, Alberto Gotta","doi":"10.1145/3650040","DOIUrl":null,"url":null,"abstract":"<p>Emotion recognition has attracted a lot of interest in recent years in various application areas such as healthcare and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiological signals. However, recent studies are looking at multimodal techniques that combine different modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative training method based on cross-modal distillation, i.e., ”FedCMD” (Federated Cross-Modal Distillation). Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main advantage of FL is that only local data is used for training, thus maintaining privacy and providing a secure and efficient emotion recognition system. The local model in FL is trained for each vehicle device with unlabeled video data by using sensor data as a proxy. Specifically, for each local model, we show how driver emotional annotations can be transferred from the sensor domain to the visual domain by using cross-modal distillation. The key idea is based on the observation that a driver’s emotional state indicated by a sensor correlates with facial expressions shown in videos. The proposed ”FedCMD” approach is tested on the multimodal dataset ”BioVid Emo DB” and achieves state-of-the-art performance. Experimental results show that our approach is robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our model is much more robust to overfitting, resulting in better generalization than the other existing methods.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"60 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedCMD: A Federated Cross-Modal Knowledge Distillation for Drivers Emotion Recognition\",\"authors\":\"Saira Bano, Nicola Tonellotto, Pietro Cassarà, Alberto Gotta\",\"doi\":\"10.1145/3650040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Emotion recognition has attracted a lot of interest in recent years in various application areas such as healthcare and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiological signals. However, recent studies are looking at multimodal techniques that combine different modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative training method based on cross-modal distillation, i.e., ”FedCMD” (Federated Cross-Modal Distillation). Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main advantage of FL is that only local data is used for training, thus maintaining privacy and providing a secure and efficient emotion recognition system. The local model in FL is trained for each vehicle device with unlabeled video data by using sensor data as a proxy. Specifically, for each local model, we show how driver emotional annotations can be transferred from the sensor domain to the visual domain by using cross-modal distillation. The key idea is based on the observation that a driver’s emotional state indicated by a sensor correlates with facial expressions shown in videos. The proposed ”FedCMD” approach is tested on the multimodal dataset ”BioVid Emo DB” and achieves state-of-the-art performance. Experimental results show that our approach is robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our model is much more robust to overfitting, resulting in better generalization than the other existing methods.</p>\",\"PeriodicalId\":48967,\"journal\":{\"name\":\"ACM Transactions on Intelligent Systems and Technology\",\"volume\":\"60 1\",\"pages\":\"\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Intelligent Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3650040\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3650040","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，情绪识别在医疗保健和自动驾驶等多个应用领域引起了广泛关注。现有的情绪识别方法基于视觉、语音或心理生理信号。然而，最近的研究正在关注结合不同模式进行情感识别的多模式技术。在这项工作中，我们利用多模态技术解决了从未标明的视频中识别用户作为司机的情绪这一问题。我们提出了一种基于跨模态蒸馏的协作训练方法，即 "FedCMD"（Federated Cross-Modal Distillation）。联合学习（Federated Learning，FL）是一种新兴的协作式分散学习技术，它允许每个参与者在不共享数据的情况下在本地训练自己的模型，以建立更好的通用全局模型。联邦学习的主要优点是只使用本地数据进行训练，从而维护了隐私，并提供了一个安全高效的情感识别系统。FL 中的局部模型是通过使用传感器数据作为代理，使用未标记的视频数据对每个车辆设备进行训练的。具体来说，对于每个局部模型，我们展示了如何通过跨模态提炼将驾驶员情绪注释从传感器域转移到视觉域。其关键思路基于这样一个观察结果，即传感器显示的驾驶员情绪状态与视频中显示的面部表情相关。所提出的 "FedCMD "方法在多模态数据集 "BioVid Emo DB "上进行了测试，取得了一流的性能。实验结果表明，我们的方法对非独立同分布数据具有鲁棒性，在使用独立同分布数据（IID）和非独立同分布数据对五种不同情绪进行分类时，准确率分别达到 96.67% 和 90.83%。此外，与其他现有方法相比，我们的模型对过拟合具有更强的鲁棒性，因而具有更好的泛化效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FedCMD: A Federated Cross-Modal Knowledge Distillation for Drivers Emotion Recognition

Emotion recognition has attracted a lot of interest in recent years in various application areas such as healthcare and autonomous driving. Existing approaches to emotion recognition are based on visual, speech, or psychophysiological signals. However, recent studies are looking at multimodal techniques that combine different modalities for emotion recognition. In this work, we address the problem of recognizing the user’s emotion as a driver from unlabeled videos using multimodal techniques. We propose a collaborative training method based on cross-modal distillation, i.e., ”FedCMD” (Federated Cross-Modal Distillation). Federated Learning (FL) is an emerging collaborative decentralized learning technique that allows each participant to train their model locally to build a better generalized global model without sharing their data. The main advantage of FL is that only local data is used for training, thus maintaining privacy and providing a secure and efficient emotion recognition system. The local model in FL is trained for each vehicle device with unlabeled video data by using sensor data as a proxy. Specifically, for each local model, we show how driver emotional annotations can be transferred from the sensor domain to the visual domain by using cross-modal distillation. The key idea is based on the observation that a driver’s emotional state indicated by a sensor correlates with facial expressions shown in videos. The proposed ”FedCMD” approach is tested on the multimodal dataset ”BioVid Emo DB” and achieves state-of-the-art performance. Experimental results show that our approach is robust to non-identically distributed data, achieving 96.67% and 90.83% accuracy in classifying five different emotions with IID (independently and identically distributed) and non-IID data, respectively. Moreover, our model is much more robust to overfitting, resulting in better generalization than the other existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.