Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation

Research Journal of Applied Sciences, Engineering and Technology Pub Date : 2023-09-12 DOI:10.31763/aet.v2i3.1143

Victor Ikechukwu Agughasi, Murali Srinivasiah

{"title":"Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation","authors":"Victor Ikechukwu Agughasi, Murali Srinivasiah","doi":"10.31763/aet.v2i3.1143","DOIUrl":null,"url":null,"abstract":"Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the cluster’s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.","PeriodicalId":21010,"journal":{"name":"Research Journal of Applied Sciences, Engineering and Technology","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Journal of Applied Sciences, Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31763/aet.v2i3.1143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the cluster’s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.

查看原文本刊更多论文

胸部x线图像的半监督标记，使用无监督聚类生成地面真相

监督分类器需要大量带有准确标签的数据来学习识别胸部x射线图像。然而，手动标记大量的CXR图像集既耗时又昂贵。为了解决这个问题，提出了一种对大量CXR图像集合进行半监督标记的方法，利用最小专家知识的无监督聚类来生成地面真值图像。提出的方法需要:使用无监督聚类技术，如K-Means和自组织地图。其次，将图像馈送到五个不同的特征向量中，充分利用特征之间的潜在差异。第三，每个数据点获得它所属的集群中心的标签。最后，使用多数投票来决定地面真实图像。所选择的方法所产生的集群数量严格限制了人类参与的数量。为了评估所提出方法的有效性，在两个公开的CXR数据集(即vdr -CXR和Montgomery数据集)上进行了实验。实验表明，对于KNN分类器，手动标记1% (VinDr-CXR)或10% (Montgomery)的训练数据，可以获得与标记整个数据集相似的性能。提出的方法有效地从公开可用的CXR数据集生成真实图像。据我们所知，这是第一个使用vdr - cxr和Montgomery数据集生成地面真实图像的研究。使用机器学习和统计技术的广泛实验分析表明，所提出的方法有效地从CXR数据集生成地面真实图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Journal of Applied Sciences, Engineering and Technology

自引率

0.00%

发文量