Mandar Bhalerao, Shlok Gujar, Aditya A. Bhave, Anant V. Nimkar
{"title":"视觉问题回答使用视频剪辑","authors":"Mandar Bhalerao, Shlok Gujar, Aditya A. Bhave, Anant V. Nimkar","doi":"10.1109/IBSSC47189.2019.8973090","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) is a technique by which humans can ask simple questions about an image and get answers. This technique can be extended on video clips to answer simple questions about the things happening in the video. The system will take a video and a natural language question as an input, and it will output a natural language answer. It is a multi-discipline research problem by nature. In this work, we limit our work to answering binary questions, i.e. questions having only yes or no as their answers. It could be further designed to answer complex questions.","PeriodicalId":148941,"journal":{"name":"2019 IEEE Bombay Section Signature Conference (IBSSC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Visual Question Answering Using Video Clips\",\"authors\":\"Mandar Bhalerao, Shlok Gujar, Aditya A. Bhave, Anant V. Nimkar\",\"doi\":\"10.1109/IBSSC47189.2019.8973090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual Question Answering (VQA) is a technique by which humans can ask simple questions about an image and get answers. This technique can be extended on video clips to answer simple questions about the things happening in the video. The system will take a video and a natural language question as an input, and it will output a natural language answer. It is a multi-discipline research problem by nature. In this work, we limit our work to answering binary questions, i.e. questions having only yes or no as their answers. It could be further designed to answer complex questions.\",\"PeriodicalId\":148941,\"journal\":{\"name\":\"2019 IEEE Bombay Section Signature Conference (IBSSC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Bombay Section Signature Conference (IBSSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IBSSC47189.2019.8973090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Bombay Section Signature Conference (IBSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBSSC47189.2019.8973090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Visual Question Answering (VQA) is a technique by which humans can ask simple questions about an image and get answers. This technique can be extended on video clips to answer simple questions about the things happening in the video. The system will take a video and a natural language question as an input, and it will output a natural language answer. It is a multi-discipline research problem by nature. In this work, we limit our work to answering binary questions, i.e. questions having only yes or no as their answers. It could be further designed to answer complex questions.