Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

ArXiv Pub Date : 2024-01-05 DOI:10.48550/arXiv.2401.02746
David Gimeno-G'omez, Ana-Maria Bucur, Adrian Cosma, Carlos-D Mart'inez-Hinarejos, Paolo Rosso
{"title":"Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues","authors":"David Gimeno-G'omez, Ana-Maria Bucur, Adrian Cosma, Carlos-D Mart'inez-Hinarejos, Paolo Rosso","doi":"10.48550/arXiv.2401.02746","DOIUrl":null,"url":null,"abstract":"Depression, a prominent contributor to global disability, affects a substantial portion of the population. Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content. In this work, we address this research gap by proposing a simple and flexible multi-modal temporal model capable of discerning non-verbal depression cues from diverse modalities in noisy, real-world videos. We show that, for in-the-wild videos, using additional high-level non-verbal cues is crucial to achieving good performance, and we extracted and processed audio speech embeddings, face emotion embeddings, face, body and hand landmarks, and gaze and blinking information. Through extensive experiments, we show that our model achieves state-of-the-art results on three key benchmark datasets for depression detection from video by a substantial margin. Our code is publicly available on GitHub.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":"72 S322","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.02746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Depression, a prominent contributor to global disability, affects a substantial portion of the population. Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content. In this work, we address this research gap by proposing a simple and flexible multi-modal temporal model capable of discerning non-verbal depression cues from diverse modalities in noisy, real-world videos. We show that, for in-the-wild videos, using additional high-level non-verbal cues is crucial to achieving good performance, and we extracted and processed audio speech embeddings, face emotion embeddings, face, body and hand landmarks, and gaze and blinking information. Through extensive experiments, we show that our model achieves state-of-the-art results on three key benchmark datasets for depression detection from video by a substantial margin. Our code is publicly available on GitHub.
读取画面之间的信息:从非语言线索检测视频中的多模式抑郁情绪
抑郁症是导致全球残疾的一个重要因素,影响着相当一部分人口。从社交媒体文本中检测抑郁症的工作一直很普遍,但只有少数作品探索了从用户生成的视频内容中检测抑郁症。在这项工作中,我们针对这一研究空白,提出了一种简单灵活的多模态时态模型,该模型能够从嘈杂的真实世界视频中的不同模态中分辨出非语言抑郁线索。我们提取并处理了音频语音嵌入、面部情绪嵌入、面部、身体和手部地标以及凝视和眨眼信息。通过广泛的实验,我们证明了我们的模型在视频抑郁检测的三个关键基准数据集上取得了最先进的结果,而且差距很大。我们的代码可在 GitHub 上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信