在自由生活环境中使用智能手机上的非视觉自我中心传感器进行多模态日常生活记录

Ke Sun, Chunyu Xia, Xinyu Zhang, Hao Chen, C. Zhang
{"title":"在自由生活环境中使用智能手机上的非视觉自我中心传感器进行多模态日常生活记录","authors":"Ke Sun, Chunyu Xia, Xinyu Zhang, Hao Chen, C. Zhang","doi":"10.1145/3643553","DOIUrl":null,"url":null,"abstract":"Egocentric non-intrusive sensing of human activities of daily living (ADL) in free-living environments represents a holy grail in ubiquitous computing. Existing approaches, such as egocentric vision and wearable motion sensors, either can be intrusive or have limitations in capturing non-ambulatory actions. To address these challenges, we propose EgoADL, the first egocentric ADL sensing system that uses an in-pocket smartphone as a multi-modal sensor hub to capture body motion, interactions with the physical environment and daily objects using non-visual sensors (audio, wireless sensing, and motion sensors). We collected a 120-hour multimodal dataset and annotated 20-hour data into 221 ADL, 70 object interactions, and 91 actions. EgoADL proposes multi-modal frame-wise slow-fast encoders to learn the feature representation of multi-sensory data that characterizes the complementary advantages of different modalities and adapt a transformer-based sequence-to-sequence model to decode the time-series sensor signals into a sequence of words that represent ADL. In addition, we introduce a self-supervised learning framework that extracts intrinsic supervisory signals from the multi-modal sensing data to overcome the lack of labeling data and achieve better generalization and extensibility. Our experiments in free-living environments demonstrate that EgoADL can achieve comparable performance with video-based approaches, bringing the vision of ambient intelligence closer to reality.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"70 5","pages":"17:1-17:32"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Daily-Life Logging in Free-living Environment Using Non-Visual Egocentric Sensors on a Smartphone\",\"authors\":\"Ke Sun, Chunyu Xia, Xinyu Zhang, Hao Chen, C. Zhang\",\"doi\":\"10.1145/3643553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Egocentric non-intrusive sensing of human activities of daily living (ADL) in free-living environments represents a holy grail in ubiquitous computing. Existing approaches, such as egocentric vision and wearable motion sensors, either can be intrusive or have limitations in capturing non-ambulatory actions. To address these challenges, we propose EgoADL, the first egocentric ADL sensing system that uses an in-pocket smartphone as a multi-modal sensor hub to capture body motion, interactions with the physical environment and daily objects using non-visual sensors (audio, wireless sensing, and motion sensors). We collected a 120-hour multimodal dataset and annotated 20-hour data into 221 ADL, 70 object interactions, and 91 actions. EgoADL proposes multi-modal frame-wise slow-fast encoders to learn the feature representation of multi-sensory data that characterizes the complementary advantages of different modalities and adapt a transformer-based sequence-to-sequence model to decode the time-series sensor signals into a sequence of words that represent ADL. In addition, we introduce a self-supervised learning framework that extracts intrinsic supervisory signals from the multi-modal sensing data to overcome the lack of labeling data and achieve better generalization and extensibility. Our experiments in free-living environments demonstrate that EgoADL can achieve comparable performance with video-based approaches, bringing the vision of ambient intelligence closer to reality.\",\"PeriodicalId\":20463,\"journal\":{\"name\":\"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.\",\"volume\":\"70 5\",\"pages\":\"17:1-17:32\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3643553\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3643553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在自由生活环境中以自我为中心对人类日常生活(ADL)活动进行非侵入式传感,是泛在计算领域的一个圣杯。现有的方法,如以自我为中心的视觉和可穿戴运动传感器,要么具有侵入性,要么在捕捉非步行动作方面存在局限性。为了应对这些挑战,我们提出了 EgoADL,这是首个以自我为中心的日常活动量传感系统,它使用口袋中的智能手机作为多模式传感器中枢,利用非视觉传感器(音频、无线传感和运动传感器)捕捉身体运动、与物理环境和日常物品的互动。我们收集了 120 小时的多模态数据集,并将 20 小时的数据注释为 221 项日常活动、70 项物体互动和 91 项行动。EgoADL 提出了多模态帧慢-快编码器来学习多感官数据的特征表征,该表征能体现不同模态的互补优势,并调整基于变换器的序列-序列模型,将时间序列传感器信号解码为代表 ADL 的词序列。此外,我们还引入了自监督学习框架,从多模态传感数据中提取内在监督信号,以克服标签数据的缺乏,实现更好的泛化和扩展性。我们在自由生活环境中进行的实验证明,EgoADL 可以达到与基于视频的方法相当的性能,使环境智能的愿景更接近现实。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multimodal Daily-Life Logging in Free-living Environment Using Non-Visual Egocentric Sensors on a Smartphone
Egocentric non-intrusive sensing of human activities of daily living (ADL) in free-living environments represents a holy grail in ubiquitous computing. Existing approaches, such as egocentric vision and wearable motion sensors, either can be intrusive or have limitations in capturing non-ambulatory actions. To address these challenges, we propose EgoADL, the first egocentric ADL sensing system that uses an in-pocket smartphone as a multi-modal sensor hub to capture body motion, interactions with the physical environment and daily objects using non-visual sensors (audio, wireless sensing, and motion sensors). We collected a 120-hour multimodal dataset and annotated 20-hour data into 221 ADL, 70 object interactions, and 91 actions. EgoADL proposes multi-modal frame-wise slow-fast encoders to learn the feature representation of multi-sensory data that characterizes the complementary advantages of different modalities and adapt a transformer-based sequence-to-sequence model to decode the time-series sensor signals into a sequence of words that represent ADL. In addition, we introduce a self-supervised learning framework that extracts intrinsic supervisory signals from the multi-modal sensing data to overcome the lack of labeling data and achieve better generalization and extensibility. Our experiments in free-living environments demonstrate that EgoADL can achieve comparable performance with video-based approaches, bringing the vision of ambient intelligence closer to reality.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信