I. Alam, Devesh Jalan, Priti Shaw, Partha Pratim Mohanta
{"title":"基于动作的视频浏览","authors":"I. Alam, Devesh Jalan, Priti Shaw, Partha Pratim Mohanta","doi":"10.1109/CALCON49167.2020.9106488","DOIUrl":null,"url":null,"abstract":"Automatic video summarization is a sustainable method that provides efficient browsing and searching mechanism for long videos. Video skimming is one of the popular ways to represent a summary of a full-length video. This work describes an unsupervised technique that automatically extracts the important clips from an input video and generates a summarized version of that video. The proposed scheme of video skimming is composed of three parts: extraction of motion based features, selection of important clips, detection, and removal of the shot boundary, if any, within a clip. Each frame is represented by a 32-dimensional feature vector that is generated using the slope and magnitude of the motion vectors. A set of representative frames of the entire video is obtained using the k-means clustering followed by the Maximal Spanning Tree (MxST). These representative frames are the center point of the clips to be generated. A window is considered around these representative frames and the clip is formed. A shot boundary may exist within the clip. To detect such a shot boundary, a method is proposed considering the variations present in the pixel intensities of the frames of a clip. The variation among the frames is captured using the standard deviation of the distribution of the pixel intensities. The clips are reformed in case the boundary is detected. Finally, the skim is generated by concatenating extracted video clips in a sequential manner. The obtained video summaries are concise and the proper representation of the input videos. The experiment is performed on two benchmark datasets namely SumMe and TVSum. Experimental results show that the proposed method outperforms the state-of-the-art methods.","PeriodicalId":318478,"journal":{"name":"2020 IEEE Calcutta Conference (CALCON)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Motion Based Video Skimming\",\"authors\":\"I. Alam, Devesh Jalan, Priti Shaw, Partha Pratim Mohanta\",\"doi\":\"10.1109/CALCON49167.2020.9106488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic video summarization is a sustainable method that provides efficient browsing and searching mechanism for long videos. Video skimming is one of the popular ways to represent a summary of a full-length video. This work describes an unsupervised technique that automatically extracts the important clips from an input video and generates a summarized version of that video. The proposed scheme of video skimming is composed of three parts: extraction of motion based features, selection of important clips, detection, and removal of the shot boundary, if any, within a clip. Each frame is represented by a 32-dimensional feature vector that is generated using the slope and magnitude of the motion vectors. A set of representative frames of the entire video is obtained using the k-means clustering followed by the Maximal Spanning Tree (MxST). These representative frames are the center point of the clips to be generated. A window is considered around these representative frames and the clip is formed. A shot boundary may exist within the clip. To detect such a shot boundary, a method is proposed considering the variations present in the pixel intensities of the frames of a clip. The variation among the frames is captured using the standard deviation of the distribution of the pixel intensities. The clips are reformed in case the boundary is detected. Finally, the skim is generated by concatenating extracted video clips in a sequential manner. The obtained video summaries are concise and the proper representation of the input videos. The experiment is performed on two benchmark datasets namely SumMe and TVSum. Experimental results show that the proposed method outperforms the state-of-the-art methods.\",\"PeriodicalId\":318478,\"journal\":{\"name\":\"2020 IEEE Calcutta Conference (CALCON)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Calcutta Conference (CALCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CALCON49167.2020.9106488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Calcutta Conference (CALCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CALCON49167.2020.9106488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic video summarization is a sustainable method that provides efficient browsing and searching mechanism for long videos. Video skimming is one of the popular ways to represent a summary of a full-length video. This work describes an unsupervised technique that automatically extracts the important clips from an input video and generates a summarized version of that video. The proposed scheme of video skimming is composed of three parts: extraction of motion based features, selection of important clips, detection, and removal of the shot boundary, if any, within a clip. Each frame is represented by a 32-dimensional feature vector that is generated using the slope and magnitude of the motion vectors. A set of representative frames of the entire video is obtained using the k-means clustering followed by the Maximal Spanning Tree (MxST). These representative frames are the center point of the clips to be generated. A window is considered around these representative frames and the clip is formed. A shot boundary may exist within the clip. To detect such a shot boundary, a method is proposed considering the variations present in the pixel intensities of the frames of a clip. The variation among the frames is captured using the standard deviation of the distribution of the pixel intensities. The clips are reformed in case the boundary is detected. Finally, the skim is generated by concatenating extracted video clips in a sequential manner. The obtained video summaries are concise and the proper representation of the input videos. The experiment is performed on two benchmark datasets namely SumMe and TVSum. Experimental results show that the proposed method outperforms the state-of-the-art methods.