识别真实图像中的视觉合成

Lin Bai, Kan Li, Shuai Jiang
{"title":"识别真实图像中的视觉合成","authors":"Lin Bai, Kan Li, Shuai Jiang","doi":"10.1109/IJCNN.2015.7280523","DOIUrl":null,"url":null,"abstract":"Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Recognizing visual composite in real images\",\"authors\":\"Lin Bai, Kan Li, Shuai Jiang\",\"doi\":\"10.1109/IJCNN.2015.7280523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.\",\"PeriodicalId\":6539,\"journal\":{\"name\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"1 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2015.7280523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

自动发现和识别图像的主要结构视觉模式是一个具有挑战性的问题。最困难的是如何找到组件对象以及如何识别这些对象之间的交互。结构化视觉模式的组件对象在图像之间具有一致的三维空间共现布局,这表现为一种可预测的模式,称为视觉复合。本文提出了一种视觉合成识别模型,用于自动发现和识别图像的视觉合成。我们的模型首先学习对象之间的三维空间共现统计,发现图像潜在的结构化视觉模式,从而捕获视觉复合的组成对象。其次,我们利用提出的因子三向交互机器构造前馈结构来识别视觉组合,将识别问题转化为结构化预测任务。它通过最大化给定组件对象及其3D空间上下文的正确结构化标签的概率来预测视觉组合。在一个六类运动数据集和一个短语识别数据集上进行的实验表明,与竞争对手的方法相比,我们的模型在发现精度和识别精度方面都有令人鼓舞的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Recognizing visual composite in real images
Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信