M&M：在认知负荷评估中整合视听线索的多模式多任务模型

Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Pub Date : 2024-03-14 DOI:10.5220/0012575100003660

Long Nguyen-Phuoc, Rénald Gaboriau, Dimitri Delacroix, Laurent Navarro

{"title":"M&M：在认知负荷评估中整合视听线索的多模式多任务模型","authors":"Long Nguyen-Phuoc, Rénald Gaboriau, Dimitri Delacroix, Laurent Navarro","doi":"10.5220/0012575100003660","DOIUrl":null,"url":null,"abstract":"This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.","PeriodicalId":517797,"journal":{"name":"Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications","volume":" 32","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment\",\"authors\":\"Long Nguyen-Phuoc, Rénald Gaboriau, Dimitri Delacroix, Laurent Navarro\",\"doi\":\"10.5220/0012575100003660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\\\\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.\",\"PeriodicalId\":517797,\"journal\":{\"name\":\"Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications\",\"volume\":\" 32\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0012575100003660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0012575100003660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了 M&M 模型，这是一种新颖的多模态多任务学习框架，适用于认知负荷评估（CLA）的 AVCAffe 数据集。M&M 通过双通道架构，以音频和视频输入专用流为特点，独特地整合了视听线索。其关键创新在于跨模态多头注意力机制，可融合不同模态进行同步多任务处理。该模型的另一个显著特点是它有三个专门分支，每个分支都针对特定的认知负荷标签，从而能够进行细致入微的特定任务分析。虽然与 AVCAffe 的单任务基线相比，M/&M 的性能并不突出，但它展示了一个很有前景的多模态综合处理框架。这项工作为未来多模态多任务学习系统的改进铺平了道路，强调融合不同的数据类型来处理复杂的任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

自引率

0.00%

发文量