M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Long Nguyen-Phuoc, Rénald Gaboriau, Dimitri Delacroix, Laurent Navarro
{"title":"M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment","authors":"Long Nguyen-Phuoc, Rénald Gaboriau, Dimitri Delacroix, Laurent Navarro","doi":"10.5220/0012575100003660","DOIUrl":null,"url":null,"abstract":"This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.","PeriodicalId":517797,"journal":{"name":"Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications","volume":" 32","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0012575100003660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.
M&M:在认知负荷评估中整合视听线索的多模式多任务模型
本文介绍了 M&M 模型,这是一种新颖的多模态多任务学习框架,适用于认知负荷评估(CLA)的 AVCAffe 数据集。M&M 通过双通道架构,以音频和视频输入专用流为特点,独特地整合了视听线索。其关键创新在于跨模态多头注意力机制,可融合不同模态进行同步多任务处理。该模型的另一个显著特点是它有三个专门分支,每个分支都针对特定的认知负荷标签,从而能够进行细致入微的特定任务分析。虽然与 AVCAffe 的单任务基线相比,M/&M 的性能并不突出,但它展示了一个很有前景的多模态综合处理框架。这项工作为未来多模态多任务学习系统的改进铺平了道路,强调融合不同的数据类型来处理复杂的任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信