PhysMamba：利用慢-快时差曼巴进行高效远程生理测量

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI:arxiv-2409.12031

Chaoqi Luo, Yiping Xie, Zitong Yu

{"title":"PhysMamba：利用慢-快时差曼巴进行高效远程生理测量","authors":"Chaoqi Luo, Yiping Xie, Zitong Yu","doi":"arxiv-2409.12031","DOIUrl":null,"url":null,"abstract":"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\nphysiological signals and monitoring heart activity without any contact,\nshowing significant potential in various applications. Previous deep learning\nbased rPPG measurement are primarily based on CNNs and Transformers. However,\nthe limited receptive fields of CNNs restrict their ability to capture\nlong-range spatio-temporal dependencies, while Transformers also struggle with\nmodeling long video sequences with high complexity. Recently, the state space\nmodels (SSMs) represented by Mamba are known for their impressive performance\non capturing long-range dependencies from long sequences. In this paper, we\npropose the PhysMamba, a Mamba-based framework, to efficiently represent\nlong-range physiological dependencies from facial videos. Specifically, we\nintroduce the Temporal Difference Mamba block to first enhance local dynamic\ndifferences and further model the long-range spatio-temporal context. Moreover,\na dual-stream SlowFast architecture is utilized to fuse the multi-scale\ntemporal features. Extensive experiments are conducted on three benchmark\ndatasets to demonstrate the superiority and efficiency of PhysMamba. The codes\nare available at https://github.com/Chaoqi31/PhysMamba","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba\",\"authors\":\"Chaoqi Luo, Yiping Xie, Zitong Yu\",\"doi\":\"arxiv-2409.12031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\\nphysiological signals and monitoring heart activity without any contact,\\nshowing significant potential in various applications. Previous deep learning\\nbased rPPG measurement are primarily based on CNNs and Transformers. However,\\nthe limited receptive fields of CNNs restrict their ability to capture\\nlong-range spatio-temporal dependencies, while Transformers also struggle with\\nmodeling long video sequences with high complexity. Recently, the state space\\nmodels (SSMs) represented by Mamba are known for their impressive performance\\non capturing long-range dependencies from long sequences. In this paper, we\\npropose the PhysMamba, a Mamba-based framework, to efficiently represent\\nlong-range physiological dependencies from facial videos. Specifically, we\\nintroduce the Temporal Difference Mamba block to first enhance local dynamic\\ndifferences and further model the long-range spatio-temporal context. Moreover,\\na dual-stream SlowFast architecture is utilized to fuse the multi-scale\\ntemporal features. Extensive experiments are conducted on three benchmark\\ndatasets to demonstrate the superiority and efficiency of PhysMamba. The codes\\nare available at https://github.com/Chaoqi31/PhysMamba\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于面部视频的远程心电图（Remote photoplethysmography，rPPG）旨在测量生理信号，并在无任何接触的情况下监测心脏活动，在各种应用中显示出巨大的潜力。以往基于深度学习的 rPPG 测量主要基于 CNN 和变换器。然而，CNN 的感受野有限，限制了其捕捉长距离时空相关性的能力，而 Transformers 也难以模拟复杂度较高的长视频序列。最近，以 Mamba 为代表的状态空间模型（SSM）在捕捉长序列中的长距离依赖关系方面表现出色。在本文中，我们提出了一个基于 Mamba 的框架 PhysMamba，以有效表示面部视频中的长距离生理依赖关系。具体来说，我们引入了时差 Mamba 模块，首先增强局部动态差异，然后进一步建立长距离时空背景模型。此外，我们还利用双流 SlowFast 架构来融合多尺度时空特征。为了证明 PhysMamba 的优越性和高效性，我们在三个基准数据集上进行了广泛的实验。代码可在 https://github.com/Chaoqi31/PhysMamba

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba

Facial-video based Remote photoplethysmography (rPPG) aims at measuring physiological signals and monitoring heart activity without any contact, showing significant potential in various applications. Previous deep learning based rPPG measurement are primarily based on CNNs and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range spatio-temporal dependencies, while Transformers also struggle with modeling long video sequences with high complexity. Recently, the state space models (SSMs) represented by Mamba are known for their impressive performance on capturing long-range dependencies from long sequences. In this paper, we propose the PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Specifically, we introduce the Temporal Difference Mamba block to first enhance local dynamic differences and further model the long-range spatio-temporal context. Moreover, a dual-stream SlowFast architecture is utilized to fuse the multi-scale temporal features. Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba. The codes are available at https://github.com/Chaoqi31/PhysMamba

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量