An efficient activity recognition for homecare robots from multi-modal communication dataset

International Journal of Advances in Intelligent Informatics Pub Date : 2023-03-15 DOI:10.26555/ijain.v9i1.903

Mohamad Yani, Yamada Nao, Chyan Zheng Siow, Kubota Naoyuki

{"title":"An efficient activity recognition for homecare robots from multi-modal communication dataset","authors":"Mohamad Yani, Yamada Nao, Chyan Zheng Siow, Kubota Naoyuki","doi":"10.26555/ijain.v9i1.903","DOIUrl":null,"url":null,"abstract":"Human environments are designed and managed by humans for humans. Thus, adding robots to interact with humans and perform specific tasks appropriately is an essential topic in robotics research. In recent decades, object recognition, human skeletal, and face recognition frameworks have been implemented to support the tasks of robots. However, recognition of activities and interactions between humans and surrounding objects is an ongoing and more challenging problem. Therefore, this study proposed a graph neural network (GNN) approach to directly recognize human activity at home using vision and speech teaching data. Focus was given to the problem of classifying three activities, namely, eating, working, and reading, where these activities were conducted in the same environment. From the experiments, observations, and analyses, this proved to be quite a challenging problem to solve using only traditional convolutional neural networks (CNN) and video datasets. In the proposed method, an activity classification was learned from a 3D detected object corresponding to the human position. Next, human utterances were used to label the activity from the collected human and object 3D positions. The experiment, involving data collection and learning, was demonstrated by using human-robot communication. It was shown that the proposed method had the shortest training time of 100.346 seconds with 6000 positions from the dataset and was able to recognize the three activities more accurately than the deep layer aggregation (DLA) and X3D networks with video datasets.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Intelligent Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/ijain.v9i1.903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Human environments are designed and managed by humans for humans. Thus, adding robots to interact with humans and perform specific tasks appropriately is an essential topic in robotics research. In recent decades, object recognition, human skeletal, and face recognition frameworks have been implemented to support the tasks of robots. However, recognition of activities and interactions between humans and surrounding objects is an ongoing and more challenging problem. Therefore, this study proposed a graph neural network (GNN) approach to directly recognize human activity at home using vision and speech teaching data. Focus was given to the problem of classifying three activities, namely, eating, working, and reading, where these activities were conducted in the same environment. From the experiments, observations, and analyses, this proved to be quite a challenging problem to solve using only traditional convolutional neural networks (CNN) and video datasets. In the proposed method, an activity classification was learned from a 3D detected object corresponding to the human position. Next, human utterances were used to label the activity from the collected human and object 3D positions. The experiment, involving data collection and learning, was demonstrated by using human-robot communication. It was shown that the proposed method had the shortest training time of 100.346 seconds with 6000 positions from the dataset and was able to recognize the three activities more accurately than the deep layer aggregation (DLA) and X3D networks with video datasets.

查看原文本刊更多论文

基于多模态通信数据集的家庭护理机器人高效活动识别

人类环境是人类为人类设计和管理的。因此，增加机器人与人类互动并适当地执行特定任务是机器人研究的一个重要课题。近几十年来，物体识别、人体骨骼和人脸识别框架已经实现，以支持机器人的任务。然而，识别人类和周围物体之间的活动和相互作用是一个持续的和更具挑战性的问题。因此，本研究提出了一种图神经网络(GNN)方法，利用视觉和语音教学数据直接识别在家中的人类活动。重点是对三种活动进行分类的问题，即吃饭、工作和阅读，这些活动是在同一环境中进行的。从实验、观察和分析来看，仅使用传统的卷积神经网络(CNN)和视频数据集来解决这是一个相当具有挑战性的问题。在该方法中，从与人体位置相对应的3D检测对象中学习活动分类。接下来，使用人的话语从收集的人和物体的3D位置标记活动。该实验涉及数据收集和学习，并通过人机通信进行了演示。实验结果表明，该方法在6000个位置上的训练时间最短，为100.346秒，并且能够比基于视频数据集的深层聚合(DLA)和X3D网络更准确地识别出三种活动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Advances in Intelligent Informatics Computer Science-Computer Vision and Pattern Recognition

CiteScore

3.00

自引率

0.00%

发文量