Multimodal Daily-Life Logging in Free-living Environment Using Non-Visual Egocentric Sensors on a Smartphone

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. Pub Date : 2024-03-06 DOI:10.1145/3643553

Ke Sun, Chunyu Xia, Xinyu Zhang, Hao Chen, C. Zhang

{"title":"Multimodal Daily-Life Logging in Free-living Environment Using Non-Visual Egocentric Sensors on a Smartphone","authors":"Ke Sun, Chunyu Xia, Xinyu Zhang, Hao Chen, C. Zhang","doi":"10.1145/3643553","DOIUrl":null,"url":null,"abstract":"Egocentric non-intrusive sensing of human activities of daily living (ADL) in free-living environments represents a holy grail in ubiquitous computing. Existing approaches, such as egocentric vision and wearable motion sensors, either can be intrusive or have limitations in capturing non-ambulatory actions. To address these challenges, we propose EgoADL, the first egocentric ADL sensing system that uses an in-pocket smartphone as a multi-modal sensor hub to capture body motion, interactions with the physical environment and daily objects using non-visual sensors (audio, wireless sensing, and motion sensors). We collected a 120-hour multimodal dataset and annotated 20-hour data into 221 ADL, 70 object interactions, and 91 actions. EgoADL proposes multi-modal frame-wise slow-fast encoders to learn the feature representation of multi-sensory data that characterizes the complementary advantages of different modalities and adapt a transformer-based sequence-to-sequence model to decode the time-series sensor signals into a sequence of words that represent ADL. In addition, we introduce a self-supervised learning framework that extracts intrinsic supervisory signals from the multi-modal sensing data to overcome the lack of labeling data and achieve better generalization and extensibility. Our experiments in free-living environments demonstrate that EgoADL can achieve comparable performance with video-based approaches, bringing the vision of ambient intelligence closer to reality.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"70 5","pages":"17:1-17:32"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3643553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Egocentric non-intrusive sensing of human activities of daily living (ADL) in free-living environments represents a holy grail in ubiquitous computing. Existing approaches, such as egocentric vision and wearable motion sensors, either can be intrusive or have limitations in capturing non-ambulatory actions. To address these challenges, we propose EgoADL, the first egocentric ADL sensing system that uses an in-pocket smartphone as a multi-modal sensor hub to capture body motion, interactions with the physical environment and daily objects using non-visual sensors (audio, wireless sensing, and motion sensors). We collected a 120-hour multimodal dataset and annotated 20-hour data into 221 ADL, 70 object interactions, and 91 actions. EgoADL proposes multi-modal frame-wise slow-fast encoders to learn the feature representation of multi-sensory data that characterizes the complementary advantages of different modalities and adapt a transformer-based sequence-to-sequence model to decode the time-series sensor signals into a sequence of words that represent ADL. In addition, we introduce a self-supervised learning framework that extracts intrinsic supervisory signals from the multi-modal sensing data to overcome the lack of labeling data and achieve better generalization and extensibility. Our experiments in free-living environments demonstrate that EgoADL can achieve comparable performance with video-based approaches, bringing the vision of ambient intelligence closer to reality.

查看原文本刊更多论文

在自由生活环境中使用智能手机上的非视觉自我中心传感器进行多模态日常生活记录

在自由生活环境中以自我为中心对人类日常生活（ADL）活动进行非侵入式传感，是泛在计算领域的一个圣杯。现有的方法，如以自我为中心的视觉和可穿戴运动传感器，要么具有侵入性，要么在捕捉非步行动作方面存在局限性。为了应对这些挑战，我们提出了 EgoADL，这是首个以自我为中心的日常活动量传感系统，它使用口袋中的智能手机作为多模式传感器中枢，利用非视觉传感器（音频、无线传感和运动传感器）捕捉身体运动、与物理环境和日常物品的互动。我们收集了 120 小时的多模态数据集，并将 20 小时的数据注释为 221 项日常活动、70 项物体互动和 91 项行动。EgoADL 提出了多模态帧慢-快编码器来学习多感官数据的特征表征，该表征能体现不同模态的互补优势，并调整基于变换器的序列-序列模型，将时间序列传感器信号解码为代表 ADL 的词序列。此外，我们还引入了自监督学习框架，从多模态传感数据中提取内在监督信号，以克服标签数据的缺乏，实现更好的泛化和扩展性。我们在自由生活环境中进行的实验证明，EgoADL 可以达到与基于视频的方法相当的性能，使环境智能的愿景更接近现实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

自引率

0.00%

发文量