{"title":"基于对象查询的视频摘要","authors":"Shweta S Kakodra, C. Sujatha, P. Desai","doi":"10.1109/CONIT51480.2021.9498311","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a framework for query-by-object(s) based video synopsis. Video Synopsis aims to create a summary of video by retaining the important activities/events present in the input video. We propose creating a shorter video by selecting salient frames based on the important objects present in the video. We train the Yolov3 model with surveillance videos for object detection. Select the frames as salient based on the importance of objects present in a frame and generate the video synopsis with the salient frames. We demonstrate the proposed method on the Summe and TV Sum dataset and own dataset captured from the surveillance camera. We obtain the average F1 score as 93% and average accuracy as 94%. And also show that proposed method gives better results as compared to VASNET model.","PeriodicalId":426131,"journal":{"name":"2021 International Conference on Intelligent Technologies (CONIT)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Query-By-Object Based Video Synopsis\",\"authors\":\"Shweta S Kakodra, C. Sujatha, P. Desai\",\"doi\":\"10.1109/CONIT51480.2021.9498311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a framework for query-by-object(s) based video synopsis. Video Synopsis aims to create a summary of video by retaining the important activities/events present in the input video. We propose creating a shorter video by selecting salient frames based on the important objects present in the video. We train the Yolov3 model with surveillance videos for object detection. Select the frames as salient based on the importance of objects present in a frame and generate the video synopsis with the salient frames. We demonstrate the proposed method on the Summe and TV Sum dataset and own dataset captured from the surveillance camera. We obtain the average F1 score as 93% and average accuracy as 94%. And also show that proposed method gives better results as compared to VASNET model.\",\"PeriodicalId\":426131,\"journal\":{\"name\":\"2021 International Conference on Intelligent Technologies (CONIT)\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Intelligent Technologies (CONIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONIT51480.2021.9498311\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT51480.2021.9498311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we propose a framework for query-by-object(s) based video synopsis. Video Synopsis aims to create a summary of video by retaining the important activities/events present in the input video. We propose creating a shorter video by selecting salient frames based on the important objects present in the video. We train the Yolov3 model with surveillance videos for object detection. Select the frames as salient based on the importance of objects present in a frame and generate the video synopsis with the salient frames. We demonstrate the proposed method on the Summe and TV Sum dataset and own dataset captured from the surveillance camera. We obtain the average F1 score as 93% and average accuracy as 94%. And also show that proposed method gives better results as compared to VASNET model.