Rabih Assaf, Abbas Rammal, Alban Goupil, Mohammad Kacim, Valeriu Vrabie
{"title":"Topological data analysis and machine learning for COVID-19 detection in CT scan lung images.","authors":"Rabih Assaf, Abbas Rammal, Alban Goupil, Mohammad Kacim, Valeriu Vrabie","doi":"10.1186/s42490-025-00089-1","DOIUrl":null,"url":null,"abstract":"<p><p>COVID-19 has claimed the lives of thousands over the past years. Although pathogenic laboratory testing is the established standard, it carries a significant drawback with a notable rate of false negatives. Consequently, there is an urgent need for alternative diagnostic approaches to combat this threat. In response to this pressing need for accurate and parameter-free methods for COVID-19 identification, particularly within lung images, we introduce a novel approach that combines the principles of topological data analysis with the capabilities of machine learning. Our proposed methodology entails the extraction of persistent homology features from lung images, effectively capturing the intrinsic topological properties inherent in the data. These extracted persistent homology features then serve as inputs for various machine learning methods employed for classification purposes. Our primary objective is to achieve exceptional accuracy in the detection of COVID-19 all while showcasing the effectiveness of these topological features. The experimental results demonstrate that the Random Forest Classifier and the Support Vector Machine models outperform the rest, showcasing their effectiveness in classifying CT scan lung images with remarkable precision-an accuracy rate of 97.5% for the Random Forest model and an AUC score that surpasses 0.99 for the SVM. Results of the model on the same data after exclusion of the topological features and on other data with application of the same model with topological features showed the efficiency of these features in the classification task.</p>","PeriodicalId":72425,"journal":{"name":"BMC biomedical engineering","volume":"7 1","pages":"4"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11963280/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC biomedical engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s42490-025-00089-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
COVID-19 has claimed the lives of thousands over the past years. Although pathogenic laboratory testing is the established standard, it carries a significant drawback with a notable rate of false negatives. Consequently, there is an urgent need for alternative diagnostic approaches to combat this threat. In response to this pressing need for accurate and parameter-free methods for COVID-19 identification, particularly within lung images, we introduce a novel approach that combines the principles of topological data analysis with the capabilities of machine learning. Our proposed methodology entails the extraction of persistent homology features from lung images, effectively capturing the intrinsic topological properties inherent in the data. These extracted persistent homology features then serve as inputs for various machine learning methods employed for classification purposes. Our primary objective is to achieve exceptional accuracy in the detection of COVID-19 all while showcasing the effectiveness of these topological features. The experimental results demonstrate that the Random Forest Classifier and the Support Vector Machine models outperform the rest, showcasing their effectiveness in classifying CT scan lung images with remarkable precision-an accuracy rate of 97.5% for the Random Forest model and an AUC score that surpasses 0.99 for the SVM. Results of the model on the same data after exclusion of the topological features and on other data with application of the same model with topological features showed the efficiency of these features in the classification task.