Towards a comprehensive computational model foraesthetic assessment of videos

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI:10.1145/2502081.2508119

Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, M. Shah

{"title":"Towards a comprehensive computational model foraesthetic assessment of videos","authors":"Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, M. Shah","doi":"10.1145/2502081.2508119","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel aesthetic model emphasizing psycho-visual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. At the lowest level, we determine dark-channel, sharpness and eye-sensitivity statistics over rectangular cells within a frame. At the next level, we extract Sentibank features (1,200 pre-trained visual classifiers) on a given frame, that invoke specific sentiments such as \"colorful clouds\", \"smiling face\" etc. and collect the classifier responses as frame-level statistics. At the topmost level, we extract trajectories from video shots. Using viewer's fixation priors, the trajectories are labeled as foreground, and background/camera on which statistics are computed. Additionally, spatio-temporal local binary patterns are computed that capture texture variations in a given shot. Classifiers are trained on individual feature representations independently. On thorough evaluation of 9 different types of features, we select the best features from each level -- dark channel, affect and camera motion statistics. Next, corresponding classifier scores are integrated in a sophisticated low-rank fusion framework to improve the final prediction scores. Our approach demonstrates strong correlation with human prediction on 1,000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"46 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2502081.2508119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

Abstract

In this paper we propose a novel aesthetic model emphasizing psycho-visual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. At the lowest level, we determine dark-channel, sharpness and eye-sensitivity statistics over rectangular cells within a frame. At the next level, we extract Sentibank features (1,200 pre-trained visual classifiers) on a given frame, that invoke specific sentiments such as "colorful clouds", "smiling face" etc. and collect the classifier responses as frame-level statistics. At the topmost level, we extract trajectories from video shots. Using viewer's fixation priors, the trajectories are labeled as foreground, and background/camera on which statistics are computed. Additionally, spatio-temporal local binary patterns are computed that capture texture variations in a given shot. Classifiers are trained on individual feature representations independently. On thorough evaluation of 9 different types of features, we select the best features from each level -- dark channel, affect and camera motion statistics. Next, corresponding classifier scores are integrated in a sophisticated low-rank fusion framework to improve the final prediction scores. Our approach demonstrates strong correlation with human prediction on 1,000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.

查看原文本刊更多论文

面向视频审美评价的综合计算模型

在本文中，我们提出了一种新的美学模型，强调从多个层面提取的心理视觉统计，而不是仅仅依赖于适合图像识别的描述符或基于摄影原理的早期方法。在最低层次上，我们在一个框架内的矩形单元上确定暗通道、锐度和眼灵敏度统计。在下一层，我们在给定的框架上提取Sentibank特征(1200个预训练的视觉分类器)，这些特征调用特定的情感，如“彩云”、“笑脸”等，并收集分类器响应作为框架级统计。在最顶层，我们从视频镜头中提取轨迹。利用观看者的注视先验，将轨迹标记为前景，并在其上计算统计数据的背景/相机。此外，计算时空局部二进制模式，以捕获给定镜头中的纹理变化。分类器是在单个特征表示上独立训练的。在对9种不同类型的特征进行全面评估后，我们从每个级别中选择最佳特征——暗通道、影响和相机运动统计。接下来，将相应的分类器分数整合到一个复杂的低秩融合框架中，以提高最终的预测分数。我们的方法与人类对NHK作为美学评估数据集发布的1000个广播质量视频的预测具有很强的相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21st ACM international conference on Multimedia

自引率

0.00%

发文量