Developing Hierarchies for Image Classification Model Evaluation

2021 4th International Conference on Artificial Intelligence for Industries (AI4I) Pub Date : 2021-09-01 DOI:10.1109/AI4I51902.2021.00016

Sami Wood, Erin Lanus, Daniel D. Doyle, Jeremy Ogorzalek, C. Franck, Laura J. Freeman

{"title":"Developing Hierarchies for Image Classification Model Evaluation","authors":"Sami Wood, Erin Lanus, Daniel D. Doyle, Jeremy Ogorzalek, C. Franck, Laura J. Freeman","doi":"10.1109/AI4I51902.2021.00016","DOIUrl":null,"url":null,"abstract":"Classes within computer vision (CV) datasets often exhibit hierarchical structures such as super-subordinate IS-A relations. While some common performance metrics for evaluating CV models such as “top-5 error” ignore hierarchical structure, metrics for hierarchical scoring exist, yet effectiveness for meaningful evaluation is dependent on the ability of the hierarchy to reflect important semantic relationships between classes. Most hierarchical scoring methods reward closeness between prediction and ground truth classes. Such schemes may produce the same score when a child is misclassified as a terrorist as when a car is misclassified as a vehicle or helicopter, ignorant of the different levels of impact of these misclassifications.An approach for developing context-aware hierarchies for use with existing evaluation metrics to reflect the cost of misclassification is needed. The contribution of this paper is to provide a hierarchy construction framework that penalizes misclassifications accordingly given a list of importance ordered categories and a hierarchical scoring method. The framework is demonstrated in a hierarchy selection use case and compared quantitatively against the “top-5 error” metric and a simple super-subordinate relation hierarchical scoring. We qualitatively discuss the efficacy and implications of each approach.","PeriodicalId":114373,"journal":{"name":"2021 4th International Conference on Artificial Intelligence for Industries (AI4I)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Conference on Artificial Intelligence for Industries (AI4I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AI4I51902.2021.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Classes within computer vision (CV) datasets often exhibit hierarchical structures such as super-subordinate IS-A relations. While some common performance metrics for evaluating CV models such as “top-5 error” ignore hierarchical structure, metrics for hierarchical scoring exist, yet effectiveness for meaningful evaluation is dependent on the ability of the hierarchy to reflect important semantic relationships between classes. Most hierarchical scoring methods reward closeness between prediction and ground truth classes. Such schemes may produce the same score when a child is misclassified as a terrorist as when a car is misclassified as a vehicle or helicopter, ignorant of the different levels of impact of these misclassifications.An approach for developing context-aware hierarchies for use with existing evaluation metrics to reflect the cost of misclassification is needed. The contribution of this paper is to provide a hierarchy construction framework that penalizes misclassifications accordingly given a list of importance ordered categories and a hierarchical scoring method. The framework is demonstrated in a hierarchy selection use case and compared quantitatively against the “top-5 error” metric and a simple super-subordinate relation hierarchical scoring. We qualitatively discuss the efficacy and implications of each approach.

查看原文本刊更多论文

图像分类模型评价的层次发展

计算机视觉(CV)数据集中的类通常表现为分层结构，例如超从属的IS-A关系。虽然评估CV模型的一些常见性能指标(如“前5个错误”)忽略了分层结构，但分层评分的指标是存在的，但有意义的评估的有效性取决于分层反映类别之间重要语义关系的能力。大多数分级评分方法奖励预测类和基础真值类之间的接近程度。当一个孩子被错误地分类为恐怖分子时，这种计划可能会产生相同的分数，当一辆汽车被错误地分类为车辆或直升机时，这些错误分类的不同程度的影响是无知的。需要一种方法来开发上下文感知层次结构，以便与现有的评估度量一起使用，以反映错误分类的代价。本文的贡献在于提供了一个层次结构框架，在给定重要排序类别列表和层次评分方法的情况下，对错误分类进行相应的惩罚。该框架在层次选择用例中进行了演示，并与“前5个错误”度量和简单的超从属关系层次评分进行了定量比较。我们定性地讨论了每种方法的功效和含义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 4th International Conference on Artificial Intelligence for Industries (AI4I)

自引率

0.00%

发文量