Lei Zhang;Haoran Ning;Jiaxin Tang;Zhenxiang Chen;Yaping Zhong;Yahong Han
{"title":"WiViPose: A Video-Aided Wi-Fi Framework for Environment-Independent 3D Human Pose Estimation","authors":"Lei Zhang;Haoran Ning;Jiaxin Tang;Zhenxiang Chen;Yaping Zhong;Yahong Han","doi":"10.1109/TMM.2025.3543090","DOIUrl":null,"url":null,"abstract":"The inherent complexity of Wi-Fi signals makes video-aided Wi-Fi 3D pose estimation difficult. The challenges include the limited generalizability of the task across diverse environments, its significant signal heterogeneity, and its inadequate ability to analyze local and geometric information. To overcome these challenges, we introduce WiViPose, a video-aided Wi-Fi framework for 3D pose estimation, which attains enhanced cross-environment generalization through cross-layer optimization. Bilinear temporal-spectral fusion (BTSF) is initially used to fuse the time-domain and frequency-domain features derived from Wi-Fi. Video features are derived from a multiresolution convolutional pose machine and enhanced by local self-attention. Cross-modality data fusion is facilitated through an attention-based transformer, with the process further refined under a supervisory mechanism. WiViPose demonstrates effectiveness by achieving an average percentage of correct keypoints (PCK)@50 of 91.01% across three typical indoor environments.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5225-5240"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891574/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The inherent complexity of Wi-Fi signals makes video-aided Wi-Fi 3D pose estimation difficult. The challenges include the limited generalizability of the task across diverse environments, its significant signal heterogeneity, and its inadequate ability to analyze local and geometric information. To overcome these challenges, we introduce WiViPose, a video-aided Wi-Fi framework for 3D pose estimation, which attains enhanced cross-environment generalization through cross-layer optimization. Bilinear temporal-spectral fusion (BTSF) is initially used to fuse the time-domain and frequency-domain features derived from Wi-Fi. Video features are derived from a multiresolution convolutional pose machine and enhanced by local self-attention. Cross-modality data fusion is facilitated through an attention-based transformer, with the process further refined under a supervisory mechanism. WiViPose demonstrates effectiveness by achieving an average percentage of correct keypoints (PCK)@50 of 91.01% across three typical indoor environments.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.