The improvement of ground truth annotation in public datasets for human detection

IF 2.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications Pub Date : 2024-04-08 DOI:10.1007/s00138-024-01527-1

Sotheany Nou, Joong-Sun Lee, Nagaaki Ohyama, Takashi Obi

{"title":"The improvement of ground truth annotation in public datasets for human detection","authors":"Sotheany Nou, Joong-Sun Lee, Nagaaki Ohyama, Takashi Obi","doi":"10.1007/s00138-024-01527-1","DOIUrl":null,"url":null,"abstract":"<p>The quality of annotations in the datasets is crucial for supervised machine learning as it significantly affects the performance of models. While many public datasets are widely used, they often suffer from annotations errors, including missing annotations, incorrect bounding box sizes, and positions. It results in low accuracy of machine learning models. However, most researchers have traditionally focused on improving model performance by enhancing algorithms, while overlooking concerns regarding data quality. This so-called model-centric AI approach has been predominant. In contrast, a data-centric AI approach, advocated by Andrew Ng at the DATA and AI Summit 2022, emphasizes enhancing data quality while keeping the model fixed, which proves to be more efficient in improving performance. Building upon this data-centric approach, we propose a method to enhance the quality of public datasets such as MS-COCO and Open Image Dataset. Our approach involves automatically retrieving missing annotations and correcting the size and position of existing bounding boxes in these datasets. Specifically, our study deals with human object detection, which is one of the prominent applications of artificial intelligence. Experimental results demonstrate improved performance with models such as Faster-RCNN, EfficientDet, and RetinaNet. We can achieve up to 32% compared to original datasets in the term of mAP after applying both proposed methods to dataset which is transformed the grouped of instances to individual instance. In summary, our methods significantly enhance the model’s performance by improving the quality of annotations at a lower cost with less time than manual improvement employed in other studies.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"23 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01527-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The quality of annotations in the datasets is crucial for supervised machine learning as it significantly affects the performance of models. While many public datasets are widely used, they often suffer from annotations errors, including missing annotations, incorrect bounding box sizes, and positions. It results in low accuracy of machine learning models. However, most researchers have traditionally focused on improving model performance by enhancing algorithms, while overlooking concerns regarding data quality. This so-called model-centric AI approach has been predominant. In contrast, a data-centric AI approach, advocated by Andrew Ng at the DATA and AI Summit 2022, emphasizes enhancing data quality while keeping the model fixed, which proves to be more efficient in improving performance. Building upon this data-centric approach, we propose a method to enhance the quality of public datasets such as MS-COCO and Open Image Dataset. Our approach involves automatically retrieving missing annotations and correcting the size and position of existing bounding boxes in these datasets. Specifically, our study deals with human object detection, which is one of the prominent applications of artificial intelligence. Experimental results demonstrate improved performance with models such as Faster-RCNN, EfficientDet, and RetinaNet. We can achieve up to 32% compared to original datasets in the term of mAP after applying both proposed methods to dataset which is transformed the grouped of instances to individual instance. In summary, our methods significantly enhance the model’s performance by improving the quality of annotations at a lower cost with less time than manual improvement employed in other studies.

Abstract Image

查看原文本刊更多论文

改进公共数据集的地面实况标注以进行人类探测

数据集注释的质量对监督式机器学习至关重要，因为它会极大地影响模型的性能。虽然许多公共数据集被广泛使用，但它们往往存在注释错误，包括注释缺失、边界框大小和位置不正确。这导致机器学习模型的准确率很低。然而，大多数研究人员传统上都专注于通过增强算法来提高模型性能，而忽略了对数据质量的关注。这种所谓的以模型为中心的人工智能方法一直占主导地位。相比之下，吴恩达（Andrew Ng）在 2022 年数据与人工智能峰会上倡导的以数据为中心的人工智能方法则强调在保持模型固定的同时提高数据质量，事实证明这种方法能更有效地提高性能。基于这种以数据为中心的方法，我们提出了一种提高 MS-COCO 和开放图像数据集等公共数据集质量的方法。我们的方法包括在这些数据集中自动检索缺失的注释并修正现有边界框的大小和位置。具体来说，我们的研究涉及人类物体检测，这是人工智能的重要应用之一。实验结果表明，Faster-RCNN、EfficientDet 和 RetinaNet 等模型的性能得到了提高。在将分组实例转换为单个实例的数据集上应用这两种方法后，我们的 mAP 值与原始数据集相比提高了 32%。总之，与其他研究中采用的人工改进方法相比，我们的方法以更低的成本和更少的时间提高了注释的质量，从而大大提高了模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Vision and Applications 工程技术-工程：电子与电气

CiteScore

6.30

自引率

3.00%

发文量

审稿时长

8.7 months

期刊介绍： Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal. Particular emphasis is placed on engineering and technology aspects of image processing and computer vision. The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.