基于类概率的图像分析视觉与上下文特征集成

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ) Pub Date : 2020-11-25 DOI:10.1109/IVCNZ51579.2020.9290686

Basim Azam, Ranju Mandal, Ligang Zhang, B. Verma

{"title":"基于类概率的图像分析视觉与上下文特征集成","authors":"Basim Azam, Ranju Mandal, Ligang Zhang, B. Verma","doi":"10.1109/IVCNZ51579.2020.9290686","DOIUrl":null,"url":null,"abstract":"Deep learning networks have become one of the most promising architectures for image parsing tasks. Although existing deep networks consider global and local contextual information of the images to learn coarse features individually, they lack automatic adaptation to the contextual properties of scenes. In this work, we present a visual and contextual feature-based deep network for image parsing. The main novelty is in the 3-layer architecture which considers contextual information and each layer is independently trained and integrated. The network explores the contextual features along with the visual features for class label prediction with class-specific classifiers. The contextual features consider the prior information learned by calculating the co-occurrence of object labels both within a whole scene and between neighboring superpixels. The class-specific classifier deals with an imbalance of data for various object categories and learns the coarse features for every category individually. A series of weak classifiers in combination with boosting algorithms are investigated as classifiers along with the aggregated contextual features. The experiments were conducted on the benchmark Stanford background dataset which showed that the proposed architecture produced the highest average accuracy and comparable global accuracy.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Class Probability-based Visual and Contextual Feature Integration for Image Parsing\",\"authors\":\"Basim Azam, Ranju Mandal, Ligang Zhang, B. Verma\",\"doi\":\"10.1109/IVCNZ51579.2020.9290686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning networks have become one of the most promising architectures for image parsing tasks. Although existing deep networks consider global and local contextual information of the images to learn coarse features individually, they lack automatic adaptation to the contextual properties of scenes. In this work, we present a visual and contextual feature-based deep network for image parsing. The main novelty is in the 3-layer architecture which considers contextual information and each layer is independently trained and integrated. The network explores the contextual features along with the visual features for class label prediction with class-specific classifiers. The contextual features consider the prior information learned by calculating the co-occurrence of object labels both within a whole scene and between neighboring superpixels. The class-specific classifier deals with an imbalance of data for various object categories and learns the coarse features for every category individually. A series of weak classifiers in combination with boosting algorithms are investigated as classifiers along with the aggregated contextual features. The experiments were conducted on the benchmark Stanford background dataset which showed that the proposed architecture produced the highest average accuracy and comparable global accuracy.\",\"PeriodicalId\":164317,\"journal\":{\"name\":\"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVCNZ51579.2020.9290686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVCNZ51579.2020.9290686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

深度学习网络已经成为图像分析任务中最有前途的架构之一。虽然现有的深度网络考虑图像的全局和局部上下文信息来单独学习粗特征，但它们缺乏对场景上下文属性的自动适应。在这项工作中，我们提出了一个基于视觉和上下文特征的图像解析深度网络。主要的新颖之处在于考虑上下文信息的三层架构，每层都是独立训练和集成的。该网络探索上下文特征和视觉特征，用于使用特定类别的分类器进行类别标签预测。上下文特征考虑了通过计算整个场景内和相邻超像素之间物体标签的共现性而获得的先验信息。特定类分类器处理不同对象类别的数据不平衡，并单独学习每个类别的粗特征。研究了一系列结合增强算法的弱分类器与聚合的上下文特征作为分类器。在基准斯坦福背景数据集上进行的实验表明，所提出的架构产生了最高的平均精度和可比的全局精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Class Probability-based Visual and Contextual Feature Integration for Image Parsing

Deep learning networks have become one of the most promising architectures for image parsing tasks. Although existing deep networks consider global and local contextual information of the images to learn coarse features individually, they lack automatic adaptation to the contextual properties of scenes. In this work, we present a visual and contextual feature-based deep network for image parsing. The main novelty is in the 3-layer architecture which considers contextual information and each layer is independently trained and integrated. The network explores the contextual features along with the visual features for class label prediction with class-specific classifiers. The contextual features consider the prior information learned by calculating the co-occurrence of object labels both within a whole scene and between neighboring superpixels. The class-specific classifier deals with an imbalance of data for various object categories and learns the coarse features for every category individually. A series of weak classifiers in combination with boosting algorithms are investigated as classifiers along with the aggregated contextual features. The experiments were conducted on the benchmark Stanford background dataset which showed that the proposed architecture produced the highest average accuracy and comparable global accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

自引率

0.00%

发文量