Deep Active Learning with Contaminated Tags for Image Aesthetics Assessment.

IF 10.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing Pub Date : 2018-04-18 DOI:10.1109/TIP.2018.2828326

Zhenguang Liu, Zepeng Wang, Yiyang Yao, Luming Zhang, Ling Shao

{"title":"Deep Active Learning with Contaminated Tags for Image Aesthetics Assessment.","authors":"Zhenguang Liu, Zepeng Wang, Yiyang Yao, Luming Zhang, Ling Shao","doi":"10.1109/TIP.2018.2828326","DOIUrl":null,"url":null,"abstract":"<p><p>Image aesthetic quality assessment has becoming an indispensable technique that facilitates a variety of image applications, e.g., photo retargeting and non-realistic rendering. Conventional approaches suffer from the following limitations: 1) the inefficiency of semantically describing images due to the inherent tag noise and incompletion, 2) the difficulty of accurately reflecting how humans actively perceive various regions inside each image, and 3) the challenge of incorporating the aesthetic experiences of multiple users. To solve these problems, we propose a novel semi-supervised deep active learning (SDAL) algorithm, which discovers how humans perceive semantically important regions from a large quantity of images partially assigned with contaminated tags. More specifically, as humans usually attend to the foreground objects before understanding them, we extract a succinct set of BING (binarized normed gradients) [60]-based object patches from each image. To simulate human visual perception, we propose SDAL which hierarchically learns human gaze shifting path (GSP) by sequentially linking semantically important object patches from each scenery. Noticeably, SDLA unifies the semantically important regions discovery and deep GSP feature learning into a principled framework, wherein only a small proportion of tagged images are adopted. Moreover, based on the sparsity penalty, SDLA can optimally abandon the noisy or redundant low-level image features. Finally, by leveraging the deeply-learned GSP features, a probabilistic model is developed for image aesthetics assessment, where the experience of multiple professional photographers can be encoded. Besides, auxiliary quality-related features can be conveniently integrated into our probabilistic model. Comprehensive experiments on a series of benchmark image sets have demonstrated the superiority of our method. As a byproduct, eye tracking experiments have shown that GSPs generated by our SDAL are about 93% consistent with real human gaze shifting paths.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":" ","pages":""},"PeriodicalIF":10.8000,"publicationDate":"2018-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TIP.2018.2828326","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Image aesthetic quality assessment has becoming an indispensable technique that facilitates a variety of image applications, e.g., photo retargeting and non-realistic rendering. Conventional approaches suffer from the following limitations: 1) the inefficiency of semantically describing images due to the inherent tag noise and incompletion, 2) the difficulty of accurately reflecting how humans actively perceive various regions inside each image, and 3) the challenge of incorporating the aesthetic experiences of multiple users. To solve these problems, we propose a novel semi-supervised deep active learning (SDAL) algorithm, which discovers how humans perceive semantically important regions from a large quantity of images partially assigned with contaminated tags. More specifically, as humans usually attend to the foreground objects before understanding them, we extract a succinct set of BING (binarized normed gradients) [60]-based object patches from each image. To simulate human visual perception, we propose SDAL which hierarchically learns human gaze shifting path (GSP) by sequentially linking semantically important object patches from each scenery. Noticeably, SDLA unifies the semantically important regions discovery and deep GSP feature learning into a principled framework, wherein only a small proportion of tagged images are adopted. Moreover, based on the sparsity penalty, SDLA can optimally abandon the noisy or redundant low-level image features. Finally, by leveraging the deeply-learned GSP features, a probabilistic model is developed for image aesthetics assessment, where the experience of multiple professional photographers can be encoded. Besides, auxiliary quality-related features can be conveniently integrated into our probabilistic model. Comprehensive experiments on a series of benchmark image sets have demonstrated the superiority of our method. As a byproduct, eye tracking experiments have shown that GSPs generated by our SDAL are about 93% consistent with real human gaze shifting paths.

查看原文本刊更多论文

使用污染标签进行图像美学评估的深度主动学习

图像美学质量评估已成为促进各种图像应用（如照片重定位和非现实渲染）不可或缺的技术。传统方法存在以下局限性：1) 由于固有的标签噪声和不完整性，对图像进行语义描述的效率低下；2) 难以准确反映人类如何主动感知每幅图像中的各个区域；3) 难以纳入多个用户的审美体验。为了解决这些问题，我们提出了一种新颖的半监督深度主动学习（SDAL）算法，该算法能从大量图像中发现人类是如何感知语义上重要的区域的，而这些图像中的部分图像标记是被污染的。更具体地说，由于人类在理解前景物体之前通常会先关注它们，因此我们从每幅图像中提取了一组简洁的基于 BING（二值化规范梯度）[60] 的物体补丁。为了模拟人类的视觉感知，我们提出了 SDAL，它通过依次连接每个场景中语义上重要的物体补丁，分层学习人类的注视移动路径（GSP）。值得注意的是，SDLA 将语义重要区域发现和深度 GSP 特征学习统一到了一个原则性框架中，其中只采用了一小部分标记图像。此外，基于稀疏性惩罚，SDLA 可以优化放弃噪声或冗余的低级图像特征。最后，利用深度学习的 GSP 特征，为图像美学评估开发了一个概率模型，其中可以编码多个专业摄影师的经验。此外，与质量相关的辅助特征也可以方便地集成到我们的概率模型中。在一系列基准图像集上进行的综合实验证明了我们方法的优越性。作为副产品，眼球跟踪实验表明，由我们的 SDAL 生成的 GSP 与真实人类目光移动路径的一致性约为 93%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Image Processing 工程技术-工程：电子与电气

CiteScore

20.90

自引率

6.60%

发文量

774

审稿时长

7.6 months

期刊介绍： The IEEE Transactions on Image Processing delves into groundbreaking theories, algorithms, and structures concerning the generation, acquisition, manipulation, transmission, scrutiny, and presentation of images, video, and multidimensional signals across diverse applications. Topics span mathematical, statistical, and perceptual aspects, encompassing modeling, representation, formation, coding, filtering, enhancement, restoration, rendering, halftoning, search, and analysis of images, video, and multidimensional signals. Pertinent applications range from image and video communications to electronic imaging, biomedical imaging, image and video systems, and remote sensing.