混杂图像识别:局部和全局信息在图像分类中的作用

2011 International Conference on Computer Vision Pub Date : 2011-11-06 DOI:10.1109/ICCV.2011.6126283

Devi Parikh

{"title":"混杂图像识别:局部和全局信息在图像分类中的作用","authors":"Devi Parikh","doi":"10.1109/ICCV.2011.6126283","DOIUrl":null,"url":null,"abstract":"The performance of current state-of-the-art computer vision algorithms at image classification falls significantly short as compared to human abilities. To reduce this gap, it is important for the community to know what problems to solve, and not just how to solve them. Towards this goal, via the use of jumbled images, we strip apart two widely investigated aspects: local and global information in images, and identify the performance bottleneck. Interestingly, humans have been shown to reliably recognize jumbled images. The goal of our paper is to determine a functional model that mimics how humans recognize jumbled images i.e. exploit local information alone, and further evaluate if existing implementations of this computational model suffice to match human performance. Surprisingly, in our series of human studies and machine experiments, we find that a simple bag-of-words based majority-vote-like strategy is an accurate functional model of how humans recognize jumbled images. Moreover, a straightforward machine implementation of this model achieves accuracies similar to human subjects at classifying jumbled images. This indicates that perhaps existing machine vision techniques already leverage local information from images effectively, and future research efforts should be focused on more advanced modeling of global information.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"196 1","pages":"519-526"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":"{\"title\":\"Recognizing jumbled images: The role of local and global information in image classification\",\"authors\":\"Devi Parikh\",\"doi\":\"10.1109/ICCV.2011.6126283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of current state-of-the-art computer vision algorithms at image classification falls significantly short as compared to human abilities. To reduce this gap, it is important for the community to know what problems to solve, and not just how to solve them. Towards this goal, via the use of jumbled images, we strip apart two widely investigated aspects: local and global information in images, and identify the performance bottleneck. Interestingly, humans have been shown to reliably recognize jumbled images. The goal of our paper is to determine a functional model that mimics how humans recognize jumbled images i.e. exploit local information alone, and further evaluate if existing implementations of this computational model suffice to match human performance. Surprisingly, in our series of human studies and machine experiments, we find that a simple bag-of-words based majority-vote-like strategy is an accurate functional model of how humans recognize jumbled images. Moreover, a straightforward machine implementation of this model achieves accuracies similar to human subjects at classifying jumbled images. This indicates that perhaps existing machine vision techniques already leverage local information from images effectively, and future research efforts should be focused on more advanced modeling of global information.\",\"PeriodicalId\":6391,\"journal\":{\"name\":\"2011 International Conference on Computer Vision\",\"volume\":\"196 1\",\"pages\":\"519-526\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"45\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2011.6126283\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2011.6126283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

摘要

目前最先进的计算机视觉算法在图像分类方面的表现与人类的能力相比明显不足。为了缩小这一差距，社区必须知道要解决什么问题，而不仅仅是如何解决问题。为了实现这一目标，通过使用混乱的图像，我们剥离了两个广泛研究的方面:图像中的局部和全局信息，并确定了性能瓶颈。有趣的是，人类已经被证明能够可靠地识别杂乱的图像。我们论文的目标是确定一个模拟人类如何识别混乱图像的功能模型，即单独利用局部信息，并进一步评估该计算模型的现有实现是否足以匹配人类的表现。令人惊讶的是，在我们的一系列人类研究和机器实验中，我们发现一个简单的基于词袋的多数投票策略是人类如何识别混乱图像的准确功能模型。此外，该模型的直接机器实现在对混乱图像进行分类时达到了与人类受试者相似的精度。这表明，也许现有的机器视觉技术已经有效地利用了图像中的局部信息，未来的研究工作应该集中在更先进的全局信息建模上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recognizing jumbled images: The role of local and global information in image classification

The performance of current state-of-the-art computer vision algorithms at image classification falls significantly short as compared to human abilities. To reduce this gap, it is important for the community to know what problems to solve, and not just how to solve them. Towards this goal, via the use of jumbled images, we strip apart two widely investigated aspects: local and global information in images, and identify the performance bottleneck. Interestingly, humans have been shown to reliably recognize jumbled images. The goal of our paper is to determine a functional model that mimics how humans recognize jumbled images i.e. exploit local information alone, and further evaluate if existing implementations of this computational model suffice to match human performance. Surprisingly, in our series of human studies and machine experiments, we find that a simple bag-of-words based majority-vote-like strategy is an accurate functional model of how humans recognize jumbled images. Moreover, a straightforward machine implementation of this model achieves accuracies similar to human subjects at classifying jumbled images. This indicates that perhaps existing machine vision techniques already leverage local information from images effectively, and future research efforts should be focused on more advanced modeling of global information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Computer Vision

自引率

0.00%

发文量