{"title":"An EM algorithm for video summarization, generative model approach","authors":"Xavier Orriols, Xavier Binefa","doi":"10.1109/ICCV.2001.937645","DOIUrl":null,"url":null,"abstract":"In this paper, we address the visual video summarization problem in a Bayesian framework in order to detect and describe the underlying temporal transformation symmetries in a video sequence. Given a set of time correlated frames, we attempt to extract a reduced number of image-like data structures which are semantically meaningful and that have the ability of representing the sequence evolution. To this end, we present a generative model which involves jointly the representation and the evolution of appearance. Applying Linear Dynamical System theory to this problem, we discuss how the temporal information is encoded yielding a manner of grouping the iconic representations of the video sequence in terms of invariance. The formulation of this problem is driven in terms of a probabilistic approach, which affords a measure of perceptual similarity taking both learned appearance and time evolution models into account.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2001.937645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27
Abstract
In this paper, we address the visual video summarization problem in a Bayesian framework in order to detect and describe the underlying temporal transformation symmetries in a video sequence. Given a set of time correlated frames, we attempt to extract a reduced number of image-like data structures which are semantically meaningful and that have the ability of representing the sequence evolution. To this end, we present a generative model which involves jointly the representation and the evolution of appearance. Applying Linear Dynamical System theory to this problem, we discuss how the temporal information is encoded yielding a manner of grouping the iconic representations of the video sequence in terms of invariance. The formulation of this problem is driven in terms of a probabilistic approach, which affords a measure of perceptual similarity taking both learned appearance and time evolution models into account.