APPROACH TO THE AUTOMATIC CREATION OF AN ANNOTATED DATASET FOR THE DETECTION, LOCALIZATION AND CLASSIFICATION OF BLOOD CELLS IN AN IMAGE

Radio Electronics, Computer Science, Control Pub Date : 2024-04-02 DOI:10.15588/1607-3274-2024-1-12

S. M. Kovalenko, O. S. Kutsenko, S. V. Kovalenko, A. Kovalenko

{"title":"APPROACH TO THE AUTOMATIC CREATION OF AN ANNOTATED DATASET FOR THE DETECTION, LOCALIZATION AND CLASSIFICATION OF BLOOD CELLS IN AN IMAGE","authors":"S. M. Kovalenko, O. S. Kutsenko, S. V. Kovalenko, A. Kovalenko","doi":"10.15588/1607-3274-2024-1-12","DOIUrl":null,"url":null,"abstract":"Context. The paper considers the problem of automating the creation of an annotated dataset for further use in a system for detecting, localizing and classifying blood cells in an image using deep learning. The subject of the research is the processes of digital image processing for object detection and localization. \nObjective. The aim of this study is to create a pipeline of digital image processing methods that can automatically generate an annotated set of blood smear images. This set will then be used to train and validate deep learning models, significantly reducing the time required by machine learning specialists. \nMethod. The proposed approach for object detection and localization is based on digital image processing methods such as filtering, thresholding, binarization, contour detection, and filling. The pipeline for detection and localization includes the following steps: The given fragment of text describes a process that involves noise reduction, conversion to the HSV color model, defining a mask for white blood cells and platelets, detecting the contours of white blood cells and platelets, determining the coordinates of the upper left and lower right corners of white blood cells and platelets, calculating the area of the region inside the bounding box, saving the obtained data, and determining the most common color in the image; filling the contours of leukocytes and platelets with said color; defining a mask for red blood cells; defining the contours of red blood cells; determining the coordinates of the upper left and lower right corners of red blood cells; calculating the area of the region within the bounding box; entering data about the found objects into the dataframe; saving to a .csv file for future use. With an unlabeled image dataset and a generated .csv file using image processing libraries, any researcher should be able to recreate a labeled dataset. \nResults. The developed approach was implemented in software for creating an annotated dataset of blood smear images \nConclusions. The study proposes and justifies an approach to automatically create a set of annotated data. The pipeline is tested on a set of unlabelled data and a set of labelled data is obtained, consisting of cell images and a .csv file with the attributes “file name”, “type”, “xmin”, “ymin”, “xmax”, “ymax”, “area”, which are the coordinates of the bounding box for each object. The number of correctly, incorrectly, and unrecognised objects is calculated manually, and metrics are calculated to assess the accuracy and quality of object detection and localisation.","PeriodicalId":518330,"journal":{"name":"Radio Electronics, Computer Science, Control","volume":"12 17","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radio Electronics, Computer Science, Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15588/1607-3274-2024-1-12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Context. The paper considers the problem of automating the creation of an annotated dataset for further use in a system for detecting, localizing and classifying blood cells in an image using deep learning. The subject of the research is the processes of digital image processing for object detection and localization. Objective. The aim of this study is to create a pipeline of digital image processing methods that can automatically generate an annotated set of blood smear images. This set will then be used to train and validate deep learning models, significantly reducing the time required by machine learning specialists. Method. The proposed approach for object detection and localization is based on digital image processing methods such as filtering, thresholding, binarization, contour detection, and filling. The pipeline for detection and localization includes the following steps: The given fragment of text describes a process that involves noise reduction, conversion to the HSV color model, defining a mask for white blood cells and platelets, detecting the contours of white blood cells and platelets, determining the coordinates of the upper left and lower right corners of white blood cells and platelets, calculating the area of the region inside the bounding box, saving the obtained data, and determining the most common color in the image; filling the contours of leukocytes and platelets with said color; defining a mask for red blood cells; defining the contours of red blood cells; determining the coordinates of the upper left and lower right corners of red blood cells; calculating the area of the region within the bounding box; entering data about the found objects into the dataframe; saving to a .csv file for future use. With an unlabeled image dataset and a generated .csv file using image processing libraries, any researcher should be able to recreate a labeled dataset. Results. The developed approach was implemented in software for creating an annotated dataset of blood smear images Conclusions. The study proposes and justifies an approach to automatically create a set of annotated data. The pipeline is tested on a set of unlabelled data and a set of labelled data is obtained, consisting of cell images and a .csv file with the attributes “file name”, “type”, “xmin”, “ymin”, “xmax”, “ymax”, “area”, which are the coordinates of the bounding box for each object. The number of correctly, incorrectly, and unrecognised objects is calculated manually, and metrics are calculated to assess the accuracy and quality of object detection and localisation.

查看原文本刊更多论文

自动创建注释数据集的方法，用于图像中血细胞的检测、定位和分类

背景。本文探讨的问题是如何自动创建注释数据集，以便进一步用于利用深度学习对图像中的血细胞进行检测、定位和分类的系统。研究主题是用于物体检测和定位的数字图像处理过程。研究目的本研究的目的是创建一个数字图像处理方法流水线，该流水线可自动生成一组带注释的血涂片图像。然后，这组图像将用于训练和验证深度学习模型，从而大大减少机器学习专家所需的时间。方法。所提出的物体检测和定位方法基于数字图像处理方法，如滤波、阈值处理、二值化、轮廓检测和填充。检测和定位的流程包括以下步骤：给定的文本片段描述了一个过程，包括降噪、转换为 HSV 颜色模型、定义白细胞和血小板的掩膜、检测白细胞和血小板的轮廓、确定白细胞和血小板左上角和右下角的坐标、计算边界框内区域的面积、保存获得的数据以及确定图像中最常见的颜色；用所述颜色填充白细胞和血小板的轮廓；定义红细胞的掩膜；定义红细胞的轮廓；确定红细胞左上角和右下角的坐标；计算边界框内区域的面积；将找到的对象的数据输入数据框；保存到.CV 文件中，以供将来使用。csv 文件，以备将来使用。只要有一个未标记的图像数据集和一个使用图像处理库生成的 .csv 文件，任何研究人员都能重新创建一个标记数据集。结果所开发的方法已在软件中实施，用于创建带注释的血涂片图像数据集。本研究提出并论证了一种自动创建标注数据集的方法。该方法在一组未标注数据上进行了测试，并获得了一组标注数据，其中包括细胞图像和一个 .csv 文件，该文件的属性包括 "文件名"、"类型"、"xmin"、"ymin"、"xmax"、"ymax "和 "面积"，其中 "xmax"、"ymax "和 "面积 "是每个对象的边界框坐标。人工计算正确、错误和未识别对象的数量，并计算指标来评估对象检测和定位的准确性和质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radio Electronics, Computer Science, Control

自引率

0.00%

发文量