{"title":"PhysMamba:利用慢-快时差曼巴进行高效远程生理测量","authors":"Chaoqi Luo, Yiping Xie, Zitong Yu","doi":"arxiv-2409.12031","DOIUrl":null,"url":null,"abstract":"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\nphysiological signals and monitoring heart activity without any contact,\nshowing significant potential in various applications. Previous deep learning\nbased rPPG measurement are primarily based on CNNs and Transformers. However,\nthe limited receptive fields of CNNs restrict their ability to capture\nlong-range spatio-temporal dependencies, while Transformers also struggle with\nmodeling long video sequences with high complexity. Recently, the state space\nmodels (SSMs) represented by Mamba are known for their impressive performance\non capturing long-range dependencies from long sequences. In this paper, we\npropose the PhysMamba, a Mamba-based framework, to efficiently represent\nlong-range physiological dependencies from facial videos. Specifically, we\nintroduce the Temporal Difference Mamba block to first enhance local dynamic\ndifferences and further model the long-range spatio-temporal context. Moreover,\na dual-stream SlowFast architecture is utilized to fuse the multi-scale\ntemporal features. Extensive experiments are conducted on three benchmark\ndatasets to demonstrate the superiority and efficiency of PhysMamba. The codes\nare available at https://github.com/Chaoqi31/PhysMamba","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba\",\"authors\":\"Chaoqi Luo, Yiping Xie, Zitong Yu\",\"doi\":\"arxiv-2409.12031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Facial-video based Remote photoplethysmography (rPPG) aims at measuring\\nphysiological signals and monitoring heart activity without any contact,\\nshowing significant potential in various applications. Previous deep learning\\nbased rPPG measurement are primarily based on CNNs and Transformers. However,\\nthe limited receptive fields of CNNs restrict their ability to capture\\nlong-range spatio-temporal dependencies, while Transformers also struggle with\\nmodeling long video sequences with high complexity. Recently, the state space\\nmodels (SSMs) represented by Mamba are known for their impressive performance\\non capturing long-range dependencies from long sequences. In this paper, we\\npropose the PhysMamba, a Mamba-based framework, to efficiently represent\\nlong-range physiological dependencies from facial videos. Specifically, we\\nintroduce the Temporal Difference Mamba block to first enhance local dynamic\\ndifferences and further model the long-range spatio-temporal context. Moreover,\\na dual-stream SlowFast architecture is utilized to fuse the multi-scale\\ntemporal features. Extensive experiments are conducted on three benchmark\\ndatasets to demonstrate the superiority and efficiency of PhysMamba. The codes\\nare available at https://github.com/Chaoqi31/PhysMamba\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba
Facial-video based Remote photoplethysmography (rPPG) aims at measuring
physiological signals and monitoring heart activity without any contact,
showing significant potential in various applications. Previous deep learning
based rPPG measurement are primarily based on CNNs and Transformers. However,
the limited receptive fields of CNNs restrict their ability to capture
long-range spatio-temporal dependencies, while Transformers also struggle with
modeling long video sequences with high complexity. Recently, the state space
models (SSMs) represented by Mamba are known for their impressive performance
on capturing long-range dependencies from long sequences. In this paper, we
propose the PhysMamba, a Mamba-based framework, to efficiently represent
long-range physiological dependencies from facial videos. Specifically, we
introduce the Temporal Difference Mamba block to first enhance local dynamic
differences and further model the long-range spatio-temporal context. Moreover,
a dual-stream SlowFast architecture is utilized to fuse the multi-scale
temporal features. Extensive experiments are conducted on three benchmark
datasets to demonstrate the superiority and efficiency of PhysMamba. The codes
are available at https://github.com/Chaoqi31/PhysMamba