{"title":"VQASTO:基于任务本体的动作监视可视化问答系统","authors":"Huy Q. Vo, Tien-Hao Phung, N. Ly","doi":"10.1109/NICS51282.2020.9335891","DOIUrl":null,"url":null,"abstract":"Question answering (QA) is a popular research topic for its applications in reality. In advance, there are Visual Question Answering (VQA) researches that aim to combine visual and textual information for question answering. Their drawback is the dependence of learning models, which impedes human intervention and interpretation. To the best of our knowledge, most of them concentrate on the general problem or some specific contexts but no one puts the QA system under action surveillance context. In this paper, we propose a QA system based on Task Ontology which is mainly responsible for mapping from a question sentence to corresponding tasks carried out to reach the appropriate answer. The advantages of task ontology are the adoption of human knowledge to solve a specific problem and the reusability. The performance of the system thus heavily depends on subtasks/models. In our scope, we focus on two main subtasks: Pose estimation/tracking and Skeleton-based action recognition. Besides, we give some enhancements to improve the time efficiency of Pose estimation/tracking and propose a new spatial-temporal feature based on drawing skeleton sequence to image for skeleton-based action recognition of videos in the wild. This method, to some extent, can overcome the challenge of bad-shape pose/skeleton produced by Pose estimation on real-world videos. The hard part of Action recognition in VQASTO is that it has to get input from Pose estimation and Pose tracking which is markedly different from having available good skeletons and merely do recognition.","PeriodicalId":308944,"journal":{"name":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"VQASTO: Visual Question Answering System for Action Surveillance based on Task Ontology\",\"authors\":\"Huy Q. Vo, Tien-Hao Phung, N. Ly\",\"doi\":\"10.1109/NICS51282.2020.9335891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question answering (QA) is a popular research topic for its applications in reality. In advance, there are Visual Question Answering (VQA) researches that aim to combine visual and textual information for question answering. Their drawback is the dependence of learning models, which impedes human intervention and interpretation. To the best of our knowledge, most of them concentrate on the general problem or some specific contexts but no one puts the QA system under action surveillance context. In this paper, we propose a QA system based on Task Ontology which is mainly responsible for mapping from a question sentence to corresponding tasks carried out to reach the appropriate answer. The advantages of task ontology are the adoption of human knowledge to solve a specific problem and the reusability. The performance of the system thus heavily depends on subtasks/models. In our scope, we focus on two main subtasks: Pose estimation/tracking and Skeleton-based action recognition. Besides, we give some enhancements to improve the time efficiency of Pose estimation/tracking and propose a new spatial-temporal feature based on drawing skeleton sequence to image for skeleton-based action recognition of videos in the wild. This method, to some extent, can overcome the challenge of bad-shape pose/skeleton produced by Pose estimation on real-world videos. The hard part of Action recognition in VQASTO is that it has to get input from Pose estimation and Pose tracking which is markedly different from having available good skeletons and merely do recognition.\",\"PeriodicalId\":308944,\"journal\":{\"name\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS51282.2020.9335891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS51282.2020.9335891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VQASTO: Visual Question Answering System for Action Surveillance based on Task Ontology
Question answering (QA) is a popular research topic for its applications in reality. In advance, there are Visual Question Answering (VQA) researches that aim to combine visual and textual information for question answering. Their drawback is the dependence of learning models, which impedes human intervention and interpretation. To the best of our knowledge, most of them concentrate on the general problem or some specific contexts but no one puts the QA system under action surveillance context. In this paper, we propose a QA system based on Task Ontology which is mainly responsible for mapping from a question sentence to corresponding tasks carried out to reach the appropriate answer. The advantages of task ontology are the adoption of human knowledge to solve a specific problem and the reusability. The performance of the system thus heavily depends on subtasks/models. In our scope, we focus on two main subtasks: Pose estimation/tracking and Skeleton-based action recognition. Besides, we give some enhancements to improve the time efficiency of Pose estimation/tracking and propose a new spatial-temporal feature based on drawing skeleton sequence to image for skeleton-based action recognition of videos in the wild. This method, to some extent, can overcome the challenge of bad-shape pose/skeleton produced by Pose estimation on real-world videos. The hard part of Action recognition in VQASTO is that it has to get input from Pose estimation and Pose tracking which is markedly different from having available good skeletons and merely do recognition.