Zhe Yang;Ying Zhang;Yanjun Li;Linchong Huang;Ping Hu;Yuexiang Lin
{"title":"融合惯性和高分辨率声学数据的隐私保护人体活动识别","authors":"Zhe Yang;Ying Zhang;Yanjun Li;Linchong Huang;Ping Hu;Yuexiang Lin","doi":"10.1109/TIM.2025.3565250","DOIUrl":null,"url":null,"abstract":"Multimodal human activity recognition (HAR) offers significant advantages over single-modality approaches, particularly in the recently discussed fusion of inertial and acoustic data. However, audio information often contains sensitive personal information. While some studies have focused on audio privacy protection, speech frequencies (below 8 kHz) can still be potentially reconstructed using deep learning techniques. This article presents a novel approach to protect audio privacy in multimodal HAR by utilizing low-cost microphones to extract high-resolution (Hi-res) audio and filtering sensitive information at both nonspeech (<inline-formula> <tex-math>$8\\sim 96$ </tex-math></inline-formula> kHz) and inaudible (<inline-formula> <tex-math>$20\\sim 96$ </tex-math></inline-formula> kHz) levels. We collected a dataset of 20 comprehensive daily activities from 15 participants using custom hardware, with ground truth built from video evidence. Building on this foundation, this article proposes a new hybrid-attention-based HAR method, which leverages self-attention (SA) for extracting salient features in both the temporal and latent space domains, as well as cross-attention (CA) for exploring intermodal relationships. According to the evaluation on the collected dataset, the proposed method demonstrates significant performance improvements over single-modality approaches and outperforms common direct concatenation fusion methods. In addition, inaudible ultrasonic frequencies have demonstrated the ability to differentiate certain activities, making them effective for multimodal fusion in scenarios with strict privacy requirements.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-20"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fusion of Inertial and High-Resolution Acoustic Data for Privacy-Preserving Human Activity Recognition\",\"authors\":\"Zhe Yang;Ying Zhang;Yanjun Li;Linchong Huang;Ping Hu;Yuexiang Lin\",\"doi\":\"10.1109/TIM.2025.3565250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal human activity recognition (HAR) offers significant advantages over single-modality approaches, particularly in the recently discussed fusion of inertial and acoustic data. However, audio information often contains sensitive personal information. While some studies have focused on audio privacy protection, speech frequencies (below 8 kHz) can still be potentially reconstructed using deep learning techniques. This article presents a novel approach to protect audio privacy in multimodal HAR by utilizing low-cost microphones to extract high-resolution (Hi-res) audio and filtering sensitive information at both nonspeech (<inline-formula> <tex-math>$8\\\\sim 96$ </tex-math></inline-formula> kHz) and inaudible (<inline-formula> <tex-math>$20\\\\sim 96$ </tex-math></inline-formula> kHz) levels. We collected a dataset of 20 comprehensive daily activities from 15 participants using custom hardware, with ground truth built from video evidence. Building on this foundation, this article proposes a new hybrid-attention-based HAR method, which leverages self-attention (SA) for extracting salient features in both the temporal and latent space domains, as well as cross-attention (CA) for exploring intermodal relationships. According to the evaluation on the collected dataset, the proposed method demonstrates significant performance improvements over single-modality approaches and outperforms common direct concatenation fusion methods. In addition, inaudible ultrasonic frequencies have demonstrated the ability to differentiate certain activities, making them effective for multimodal fusion in scenarios with strict privacy requirements.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-20\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10980212/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10980212/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Fusion of Inertial and High-Resolution Acoustic Data for Privacy-Preserving Human Activity Recognition
Multimodal human activity recognition (HAR) offers significant advantages over single-modality approaches, particularly in the recently discussed fusion of inertial and acoustic data. However, audio information often contains sensitive personal information. While some studies have focused on audio privacy protection, speech frequencies (below 8 kHz) can still be potentially reconstructed using deep learning techniques. This article presents a novel approach to protect audio privacy in multimodal HAR by utilizing low-cost microphones to extract high-resolution (Hi-res) audio and filtering sensitive information at both nonspeech ($8\sim 96$ kHz) and inaudible ($20\sim 96$ kHz) levels. We collected a dataset of 20 comprehensive daily activities from 15 participants using custom hardware, with ground truth built from video evidence. Building on this foundation, this article proposes a new hybrid-attention-based HAR method, which leverages self-attention (SA) for extracting salient features in both the temporal and latent space domains, as well as cross-attention (CA) for exploring intermodal relationships. According to the evaluation on the collected dataset, the proposed method demonstrates significant performance improvements over single-modality approaches and outperforms common direct concatenation fusion methods. In addition, inaudible ultrasonic frequencies have demonstrated the ability to differentiate certain activities, making them effective for multimodal fusion in scenarios with strict privacy requirements.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.