VQASTO:基于任务本体的动作监视可视化问答系统

2020 7th NAFOSTED Conference on Information and Computer Science (NICS) Pub Date : 2020-11-26 DOI:10.1109/NICS51282.2020.9335891

Huy Q. Vo, Tien-Hao Phung, N. Ly

{"title":"VQASTO:基于任务本体的动作监视可视化问答系统","authors":"Huy Q. Vo, Tien-Hao Phung, N. Ly","doi":"10.1109/NICS51282.2020.9335891","DOIUrl":null,"url":null,"abstract":"Question answering (QA) is a popular research topic for its applications in reality. In advance, there are Visual Question Answering (VQA) researches that aim to combine visual and textual information for question answering. Their drawback is the dependence of learning models, which impedes human intervention and interpretation. To the best of our knowledge, most of them concentrate on the general problem or some specific contexts but no one puts the QA system under action surveillance context. In this paper, we propose a QA system based on Task Ontology which is mainly responsible for mapping from a question sentence to corresponding tasks carried out to reach the appropriate answer. The advantages of task ontology are the adoption of human knowledge to solve a specific problem and the reusability. The performance of the system thus heavily depends on subtasks/models. In our scope, we focus on two main subtasks: Pose estimation/tracking and Skeleton-based action recognition. Besides, we give some enhancements to improve the time efficiency of Pose estimation/tracking and propose a new spatial-temporal feature based on drawing skeleton sequence to image for skeleton-based action recognition of videos in the wild. This method, to some extent, can overcome the challenge of bad-shape pose/skeleton produced by Pose estimation on real-world videos. The hard part of Action recognition in VQASTO is that it has to get input from Pose estimation and Pose tracking which is markedly different from having available good skeletons and merely do recognition.","PeriodicalId":308944,"journal":{"name":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"VQASTO: Visual Question Answering System for Action Surveillance based on Task Ontology\",\"authors\":\"Huy Q. Vo, Tien-Hao Phung, N. Ly\",\"doi\":\"10.1109/NICS51282.2020.9335891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question answering (QA) is a popular research topic for its applications in reality. In advance, there are Visual Question Answering (VQA) researches that aim to combine visual and textual information for question answering. Their drawback is the dependence of learning models, which impedes human intervention and interpretation. To the best of our knowledge, most of them concentrate on the general problem or some specific contexts but no one puts the QA system under action surveillance context. In this paper, we propose a QA system based on Task Ontology which is mainly responsible for mapping from a question sentence to corresponding tasks carried out to reach the appropriate answer. The advantages of task ontology are the adoption of human knowledge to solve a specific problem and the reusability. The performance of the system thus heavily depends on subtasks/models. In our scope, we focus on two main subtasks: Pose estimation/tracking and Skeleton-based action recognition. Besides, we give some enhancements to improve the time efficiency of Pose estimation/tracking and propose a new spatial-temporal feature based on drawing skeleton sequence to image for skeleton-based action recognition of videos in the wild. This method, to some extent, can overcome the challenge of bad-shape pose/skeleton produced by Pose estimation on real-world videos. The hard part of Action recognition in VQASTO is that it has to get input from Pose estimation and Pose tracking which is markedly different from having available good skeletons and merely do recognition.\",\"PeriodicalId\":308944,\"journal\":{\"name\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS51282.2020.9335891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS51282.2020.9335891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

问答技术由于在现实中的应用而成为一个热门的研究课题。在此之前，已有视觉问答(Visual Question answer, VQA)的研究，其目的是将视觉信息和文本信息结合起来进行问答。它们的缺点是学习模型的依赖性，这阻碍了人类的干预和解释。据我们所知，他们中的大多数都专注于一般问题或某些特定环境，但没有人将QA系统置于行动监视环境中。在本文中，我们提出了一个基于任务本体的问答系统，该系统主要负责将问题句映射到相应的任务中，从而得到合适的答案。任务本体的优点是利用人类的知识来解决特定的问题，并且具有可重用性。因此，系统的性能在很大程度上依赖于子任务/模型。在我们的范围内，我们专注于两个主要的子任务:姿态估计/跟踪和基于骨骼的动作识别。此外，为了提高姿态估计/跟踪的时间效率，提出了一种基于图像绘制骨架序列的时空特征，用于野外视频的骨架动作识别。该方法在一定程度上克服了真实视频中pose估计产生的不良姿态/骨架的挑战。VQASTO中动作识别的难点在于它必须从姿态估计和姿态跟踪中获得输入，这与拥有可用的良好骨架和仅仅进行识别明显不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VQASTO: Visual Question Answering System for Action Surveillance based on Task Ontology

Question answering (QA) is a popular research topic for its applications in reality. In advance, there are Visual Question Answering (VQA) researches that aim to combine visual and textual information for question answering. Their drawback is the dependence of learning models, which impedes human intervention and interpretation. To the best of our knowledge, most of them concentrate on the general problem or some specific contexts but no one puts the QA system under action surveillance context. In this paper, we propose a QA system based on Task Ontology which is mainly responsible for mapping from a question sentence to corresponding tasks carried out to reach the appropriate answer. The advantages of task ontology are the adoption of human knowledge to solve a specific problem and the reusability. The performance of the system thus heavily depends on subtasks/models. In our scope, we focus on two main subtasks: Pose estimation/tracking and Skeleton-based action recognition. Besides, we give some enhancements to improve the time efficiency of Pose estimation/tracking and propose a new spatial-temporal feature based on drawing skeleton sequence to image for skeleton-based action recognition of videos in the wild. This method, to some extent, can overcome the challenge of bad-shape pose/skeleton produced by Pose estimation on real-world videos. The hard part of Action recognition in VQASTO is that it has to get input from Pose estimation and Pose tracking which is markedly different from having available good skeletons and merely do recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 7th NAFOSTED Conference on Information and Computer Science (NICS)

自引率

0.00%

发文量