Multi-granularity sequence generation for hierarchical image classification

IF 18.3 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media Pub Date : 2024-01-03 DOI:10.1007/s41095-022-0332-2

Xinda Liu, Lili Wang

{"title":"Multi-granularity sequence generation for hierarchical image classification","authors":"Xinda Liu, Lili Wang","doi":"10.1007/s41095-022-0332-2","DOIUrl":null,"url":null,"abstract":"<p>Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg.\n</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"19 1","pages":""},"PeriodicalIF":18.3000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Visual Media","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s41095-022-0332-2","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg.

Abstract Image

查看原文本刊更多论文

为分层图像分类生成多粒度序列

分层多粒度图像分类是一项具有挑战性的任务，其目的是为每幅给定图像同时标记多个粒度标签。现有方法往往忽略了不同图像区域对不同粒度标签预测的贡献不同，也没有充分考虑分层多粒度标签之间的关系。我们引入了序列到序列机制来克服这两个问题，并针对分层多粒度图像分类任务提出了一种多粒度序列生成（MGSG）方法。具体来说，我们引入了一种转换器架构，将图像编码为视觉表示序列。然后，我们遍历分类树，将多粒度标签组织成序列，并将其矢量化和添加位置信息。所提出的多粒度序列生成方法建立了一个解码器，将视觉表示序列和语义标签嵌入作为输入，并输出预测的多粒度标签序列。解码器通过遮蔽式多头自我注意机制对多粒度标签之间的依赖性和相关性进行建模，并通过跨模态注意机制将视觉信息与语义标签信息联系起来。这样，所提出的方法就保留了不同粒度标签之间的关系，并考虑到了不同图像区域对不同粒度标签的影响。通过对六个公共基准的评估，定性和定量地证明了所提方法的优势。我们的项目见 https://github.com/liuxindazz/mgsg。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Visual Media Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

16.90

自引率

5.80%

发文量

243

审稿时长

6 weeks

期刊介绍： Computational Visual Media is a peer-reviewed open access journal. It publishes original high-quality research papers and significant review articles on novel ideas, methods, and systems relevant to visual media. Computational Visual Media publishes articles that focus on, but are not limited to, the following areas: • Editing and composition of visual media • Geometric computing for images and video • Geometry modeling and processing • Machine learning for visual media • Physically based animation • Realistic rendering • Recognition and understanding of visual media • Visual computing for robotics • Visualization and visual analytics Other interdisciplinary research into visual media that combines aspects of computer graphics, computer vision, image and video processing, geometric computing, and machine learning is also within the journal''s scope. This is an open access journal, published quarterly by Tsinghua University Press and Springer. The open access fees (article-processing charges) are fully sponsored by Tsinghua University, China. Authors can publish in the journal without any additional charges.