Enhancing object recognition: The role of object knowledge decomposition and component-labeled datasets

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-22 DOI:10.1016/j.neucom.2024.128969

Nuoye Xiong , Ning Wang , Hongsheng Li , Guangming Zhu , Liang Zhang , Syed Afaq Ali Shah , Mohammed Bennamoun

{"title":"Enhancing object recognition: The role of object knowledge decomposition and component-labeled datasets","authors":"Nuoye Xiong , Ning Wang , Hongsheng Li , Guangming Zhu , Liang Zhang , Syed Afaq Ali Shah , Mohammed Bennamoun","doi":"10.1016/j.neucom.2024.128969","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning models’ decision-making processes can be elusive, often raising concerns about their reliability. To address this, we have introduced the Object Knowledge Decomposition and Components Label Dataset (OKD-CL), designed to improve the interpretability and accuracy of object recognition models. This dataset includes 99 categories from PartImageNet, each detailed with clear physical structures that align with human visual concepts. In a hierarchical structure, every category is described by Abstract Component Knowledge (ACK) descriptions and each image instance comes with Explicit Visual Knowledge (EVK) masks, highlighting the visual components’ appearance. By evaluating multiple deep neural networks guided with ACK and EVK (dual-knowledge-guidance approach), we saw better accuracy and a higher Foreground Reasoning Ratio (FRR), confirming our knowledge-guided method’s effectiveness. When used on the Hard-ImageNet dataset, this approach reduced the model’s reliance on incorrect feature assumptions without sacrificing classification accuracy. This hierarchical comprehension encouraged by OKD-CL is crucial in minimizing incorrect feature associations and strengthening model robustness. The entire code and dataset are available on: <span><span>https://github.com/XiGuaBo/OKD-CL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128969"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017405","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning models’ decision-making processes can be elusive, often raising concerns about their reliability. To address this, we have introduced the Object Knowledge Decomposition and Components Label Dataset (OKD-CL), designed to improve the interpretability and accuracy of object recognition models. This dataset includes 99 categories from PartImageNet, each detailed with clear physical structures that align with human visual concepts. In a hierarchical structure, every category is described by Abstract Component Knowledge (ACK) descriptions and each image instance comes with Explicit Visual Knowledge (EVK) masks, highlighting the visual components’ appearance. By evaluating multiple deep neural networks guided with ACK and EVK (dual-knowledge-guidance approach), we saw better accuracy and a higher Foreground Reasoning Ratio (FRR), confirming our knowledge-guided method’s effectiveness. When used on the Hard-ImageNet dataset, this approach reduced the model’s reliance on incorrect feature assumptions without sacrificing classification accuracy. This hierarchical comprehension encouraged by OKD-CL is crucial in minimizing incorrect feature associations and strengthening model robustness. The entire code and dataset are available on: https://github.com/XiGuaBo/OKD-CL.

查看原文本刊更多论文

增强对象识别：对象知识分解和组件标记数据集的作用

深度学习模型的决策过程可能难以捉摸，这常常引发人们对其可靠性的担忧。为了解决这个问题，我们引入了对象知识分解和组件标签数据集（OKD-CL），旨在提高对象识别模型的可解释性和准确性。该数据集包括来自PartImageNet的99个类别，每个类别都有清晰的物理结构，与人类的视觉概念保持一致。在层次结构中，每个类别都通过抽象组件知识（ACK）描述来描述，每个图像实例都带有显式视觉知识（EVK）掩码，突出显示视觉组件的外观。通过对ACK和EVK（双知识引导方法）引导的多个深度神经网络进行评估，我们看到了更好的准确率和更高的前景推理比率（FRR），证实了我们的知识引导方法的有效性。当在Hard-ImageNet数据集上使用时，这种方法减少了模型对不正确特征假设的依赖，而不会牺牲分类精度。OKD-CL鼓励的这种分层理解对于最小化不正确的特征关联和增强模型鲁棒性至关重要。完整的代码和数据集可在：https://github.com/XiGuaBo/OKD-CL。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.