识别真实图像中的视觉合成

2015 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2015-07-12 DOI:10.1109/IJCNN.2015.7280523

Lin Bai, Kan Li, Shuai Jiang

{"title":"识别真实图像中的视觉合成","authors":"Lin Bai, Kan Li, Shuai Jiang","doi":"10.1109/IJCNN.2015.7280523","DOIUrl":null,"url":null,"abstract":"Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Recognizing visual composite in real images\",\"authors\":\"Lin Bai, Kan Li, Shuai Jiang\",\"doi\":\"10.1109/IJCNN.2015.7280523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.\",\"PeriodicalId\":6539,\"journal\":{\"name\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"1 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2015.7280523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

自动发现和识别图像的主要结构视觉模式是一个具有挑战性的问题。最困难的是如何找到组件对象以及如何识别这些对象之间的交互。结构化视觉模式的组件对象在图像之间具有一致的三维空间共现布局，这表现为一种可预测的模式，称为视觉复合。本文提出了一种视觉合成识别模型，用于自动发现和识别图像的视觉合成。我们的模型首先学习对象之间的三维空间共现统计，发现图像潜在的结构化视觉模式，从而捕获视觉复合的组成对象。其次，我们利用提出的因子三向交互机器构造前馈结构来识别视觉组合，将识别问题转化为结构化预测任务。它通过最大化给定组件对象及其3D空间上下文的正确结构化标签的概率来预测视觉组合。在一个六类运动数据集和一个短语识别数据集上进行的实验表明，与竞争对手的方法相比，我们的模型在发现精度和识别精度方面都有令人鼓舞的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recognizing visual composite in real images

Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量