Topological data analysis and machine learning for COVID-19 detection in CT scan lung images.

BMC biomedical engineering Pub Date : 2025-04-02 DOI:10.1186/s42490-025-00089-1

Rabih Assaf, Abbas Rammal, Alban Goupil, Mohammad Kacim, Valeriu Vrabie

{"title":"Topological data analysis and machine learning for COVID-19 detection in CT scan lung images.","authors":"Rabih Assaf, Abbas Rammal, Alban Goupil, Mohammad Kacim, Valeriu Vrabie","doi":"10.1186/s42490-025-00089-1","DOIUrl":null,"url":null,"abstract":"<p><p>COVID-19 has claimed the lives of thousands over the past years. Although pathogenic laboratory testing is the established standard, it carries a significant drawback with a notable rate of false negatives. Consequently, there is an urgent need for alternative diagnostic approaches to combat this threat. In response to this pressing need for accurate and parameter-free methods for COVID-19 identification, particularly within lung images, we introduce a novel approach that combines the principles of topological data analysis with the capabilities of machine learning. Our proposed methodology entails the extraction of persistent homology features from lung images, effectively capturing the intrinsic topological properties inherent in the data. These extracted persistent homology features then serve as inputs for various machine learning methods employed for classification purposes. Our primary objective is to achieve exceptional accuracy in the detection of COVID-19 all while showcasing the effectiveness of these topological features. The experimental results demonstrate that the Random Forest Classifier and the Support Vector Machine models outperform the rest, showcasing their effectiveness in classifying CT scan lung images with remarkable precision-an accuracy rate of 97.5% for the Random Forest model and an AUC score that surpasses 0.99 for the SVM. Results of the model on the same data after exclusion of the topological features and on other data with application of the same model with topological features showed the efficiency of these features in the classification task.</p>","PeriodicalId":72425,"journal":{"name":"BMC biomedical engineering","volume":"7 1","pages":"4"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11963280/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC biomedical engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s42490-025-00089-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

COVID-19 has claimed the lives of thousands over the past years. Although pathogenic laboratory testing is the established standard, it carries a significant drawback with a notable rate of false negatives. Consequently, there is an urgent need for alternative diagnostic approaches to combat this threat. In response to this pressing need for accurate and parameter-free methods for COVID-19 identification, particularly within lung images, we introduce a novel approach that combines the principles of topological data analysis with the capabilities of machine learning. Our proposed methodology entails the extraction of persistent homology features from lung images, effectively capturing the intrinsic topological properties inherent in the data. These extracted persistent homology features then serve as inputs for various machine learning methods employed for classification purposes. Our primary objective is to achieve exceptional accuracy in the detection of COVID-19 all while showcasing the effectiveness of these topological features. The experimental results demonstrate that the Random Forest Classifier and the Support Vector Machine models outperform the rest, showcasing their effectiveness in classifying CT scan lung images with remarkable precision-an accuracy rate of 97.5% for the Random Forest model and an AUC score that surpasses 0.99 for the SVM. Results of the model on the same data after exclusion of the topological features and on other data with application of the same model with topological features showed the efficiency of these features in the classification task.

Abstract Image

查看原文本刊更多论文

CT扫描肺部图像中COVID-19检测的拓扑数据分析和机器学习。

在过去几年中，COVID-19 已夺去了数千人的生命。虽然病原体实验室检测是既定的标准，但它也有一个显著的缺点，那就是假阴性率很高。因此，迫切需要替代诊断方法来应对这一威胁。为了满足对准确且无参数的 COVID-19 识别方法（尤其是在肺部图像中）的迫切需求，我们引入了一种将拓扑数据分析原理与机器学习功能相结合的新方法。我们提出的方法需要从肺部图像中提取持久同源性特征，从而有效捕捉数据固有的拓扑特性。这些提取的持久同源性特征可作为各种机器学习方法的输入，用于分类目的。我们的主要目标是在检测 COVID-19 时达到极高的准确率，同时展示这些拓扑特征的有效性。实验结果表明，随机森林分类器和支持向量机模型的表现优于其他模型，它们在对 CT 扫描肺部图像进行分类时效果显著--随机森林模型的准确率高达 97.5%，而 SVM 的 AUC 分数超过了 0.99。该模型在排除拓扑特征后的相同数据上的结果，以及在其他数据上应用具有拓扑特征的相同模型的结果，都显示了这些特征在分类任务中的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC biomedical engineering

自引率

0.00%

发文量

审稿时长

19 weeks