{"title":"HOIsim:合成逼真的三维人-物交互数据,用于人类活动识别","authors":"Marsil Zakour, Alaeddine Mellouli, R. Chaudhari","doi":"10.1109/RO-MAN50785.2021.9515349","DOIUrl":null,"url":null,"abstract":"Correct understanding of human activities is critical for meaningful assistance by robots in daily life. The development of perception algorithms and Deep Learning models of human activity requires large-scale sensor datasets. Good real-world activity data is, however, difficult and time- consuming to acquire. Several precisely calibrated and time- synchronized sensors are required, and the annotation and labeling of the collected sensor data is extremely labor intensive.To address these challenges, we present a 3D activity simulator, \"HOIsim\", focusing on Human-Object Interactions (HOIs). Using HOIsim, we provide a procedurally generated synthetic dataset of two sample daily life activities \"lunch\" and \"breakfast\". The dataset contains out-of-the-box ground truth annotations in the form of human and object poses, as well as ground truth activity labels. Furthermore, we introduce methods to meaningfully randomize activity flows and the environment topology. This allows us to generate a large number of random variants of these activities in very less time.Based on an abstraction of the low-level pose data in the form of spatiotemporal graphs of HOIs, we evaluate the generated Lunch dataset only with two Deep Learning models for activity recognition. The first model, based on recurrent neural networks achieves an accuracy of 87%, whereas the other, based on transformers, achieves an accuracy of 94.7%.","PeriodicalId":6854,"journal":{"name":"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)","volume":"1 1","pages":"1124-1131"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"HOIsim: Synthesizing Realistic 3D Human-Object Interaction Data for Human Activity Recognition\",\"authors\":\"Marsil Zakour, Alaeddine Mellouli, R. Chaudhari\",\"doi\":\"10.1109/RO-MAN50785.2021.9515349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Correct understanding of human activities is critical for meaningful assistance by robots in daily life. The development of perception algorithms and Deep Learning models of human activity requires large-scale sensor datasets. Good real-world activity data is, however, difficult and time- consuming to acquire. Several precisely calibrated and time- synchronized sensors are required, and the annotation and labeling of the collected sensor data is extremely labor intensive.To address these challenges, we present a 3D activity simulator, \\\"HOIsim\\\", focusing on Human-Object Interactions (HOIs). Using HOIsim, we provide a procedurally generated synthetic dataset of two sample daily life activities \\\"lunch\\\" and \\\"breakfast\\\". The dataset contains out-of-the-box ground truth annotations in the form of human and object poses, as well as ground truth activity labels. Furthermore, we introduce methods to meaningfully randomize activity flows and the environment topology. This allows us to generate a large number of random variants of these activities in very less time.Based on an abstraction of the low-level pose data in the form of spatiotemporal graphs of HOIs, we evaluate the generated Lunch dataset only with two Deep Learning models for activity recognition. The first model, based on recurrent neural networks achieves an accuracy of 87%, whereas the other, based on transformers, achieves an accuracy of 94.7%.\",\"PeriodicalId\":6854,\"journal\":{\"name\":\"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)\",\"volume\":\"1 1\",\"pages\":\"1124-1131\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RO-MAN50785.2021.9515349\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN50785.2021.9515349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HOIsim: Synthesizing Realistic 3D Human-Object Interaction Data for Human Activity Recognition
Correct understanding of human activities is critical for meaningful assistance by robots in daily life. The development of perception algorithms and Deep Learning models of human activity requires large-scale sensor datasets. Good real-world activity data is, however, difficult and time- consuming to acquire. Several precisely calibrated and time- synchronized sensors are required, and the annotation and labeling of the collected sensor data is extremely labor intensive.To address these challenges, we present a 3D activity simulator, "HOIsim", focusing on Human-Object Interactions (HOIs). Using HOIsim, we provide a procedurally generated synthetic dataset of two sample daily life activities "lunch" and "breakfast". The dataset contains out-of-the-box ground truth annotations in the form of human and object poses, as well as ground truth activity labels. Furthermore, we introduce methods to meaningfully randomize activity flows and the environment topology. This allows us to generate a large number of random variants of these activities in very less time.Based on an abstraction of the low-level pose data in the form of spatiotemporal graphs of HOIs, we evaluate the generated Lunch dataset only with two Deep Learning models for activity recognition. The first model, based on recurrent neural networks achieves an accuracy of 87%, whereas the other, based on transformers, achieves an accuracy of 94.7%.