Logit prototype learning with active multimodal representation for robust open-set recognition

IF 7.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Science China Information Sciences Pub Date : 2024-05-17 DOI:10.1007/s11432-023-3924-x

Yimin Fu, Zhunga Liu, Zicheng Wang

{"title":"Logit prototype learning with active multimodal representation for robust open-set recognition","authors":"Yimin Fu, Zhunga Liu, Zicheng Wang","doi":"10.1007/s11432-023-3924-x","DOIUrl":null,"url":null,"abstract":"<p>Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"8 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-3924-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

查看原文本刊更多论文

利用主动多模态表示的对数原型学习，实现稳健的开放集识别

强大的开放集识别（OSR）性能已成为实际应用中模式识别系统的先决条件。然而，现有的开放集识别方法主要是基于单模态感知实现的，当单模态数据无法提供足够的物体描述时，其性能就会受到限制。虽然多模态数据能提供比单模态数据更全面的信息，但决策边界的学习会受到不同模态之间特征表示差距的影响。为了有效地整合多模态数据，实现稳健的 OSR 性能，我们提出了具有主动多模态表征的 logit 原型学习（LPL）。在 LPL 中，输入的多模态数据被转换到 logit 空间，从而可以直接探索模态间的相关性，而不会受到尺度不一致的影响。然后，使用基于熵的不确定性估计方法确定每种模态的融合权重。这种方法实现了融合策略的自适应调整，从而在存在外部干扰的情况下提供全面的描述。此外，对单模态和多模态表征进行交互式联合优化，以学习判别决策边界。最后，采用逐步识别规则来降低误分类风险，并促进已知和未知类别之间的区分。我们在三个多模态数据集上进行了广泛的实验，以证明所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

12.60

自引率

5.70%

发文量

224

审稿时长

8.3 months

期刊介绍： Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.