Andrew J Codlin, Luan N Q Vo, Thang P Dao, Rachel J Forse, Ha T M Dang, Lan H Nguyen, Hoa B Nguyen, Luong V Dinh, Kristi Sidney Annerstedt, Johan Lundin, Knut Lönnroth
{"title":"Comparison of different Lunit INSIGHT CXR software versions when reading chest radiographs for tuberculosis.","authors":"Andrew J Codlin, Luan N Q Vo, Thang P Dao, Rachel J Forse, Ha T M Dang, Lan H Nguyen, Hoa B Nguyen, Luong V Dinh, Kristi Sidney Annerstedt, Johan Lundin, Knut Lönnroth","doi":"10.1371/journal.pdig.0000813","DOIUrl":null,"url":null,"abstract":"<p><p>New versions of computer-aided detection (CAD) software for chest X-ray (CXR) interpretation during tuberculosis (TB) screening are regularly released which purport to have incremental performance gains. No studies have independently assessed differences in software performance between the World Health Organization recommended INSIGHT CXR software (Lunit, South Korea). A well-characterized Digital Imaging and Communications in Medicine (DICOM) test library was compiled using data from a community-based TB screening initiative in Ho Chi Minh City, Viet Nam. The performance of Lunit CAD software versions 3.1.0.0 and 3.9.0.1 (newer version) were compared by measuring the area under the receiver operating characteristic curve (AUC), stratified by key clinical and demographic variables and using Xpert MTB/RIF Ultra (Ultra) test results as the reference standard. Median abnormality scores were compared using the Wilcoxon signed-rank test and performance characteristics were compared at clinically-relevant cut-off thresholds (e.g., 90% sensitivity) between the versions. The DICOM test library contained 2,708 participants, of whom 10.3% had a Mycobacterium tuberculosis (MTB) positive Ultra test result. The newer software version had a significantly higher AUC than its predecessor (AUC 0.76 vs 0.78, p = 0.029), and performed significantly better among people with a past history of TB (AUC 0.67 vs 0.73, p = 0.003), older individuals (0.75 vs 0.77, p = 0.040) and males (0.73 vs 0.76, p = 0.008). When using an cut-off threshold optimized for the older software version, the newer software was significantly less accurate than its predecessors. However, when the cut-off threshold was re-calibrated, there were no significant differences in sensitivity and specificity between the software versions. Although INSIGHT CXR v3.9.0.1 has some significantly improved performance characteristics compared to its predecessor, further studies should assess how these performance differences translate into real-world improvements during TB screening. As new CAD software versions are rolled out, cut-off thresholds must be re-calibrated to ensure the continued accuracy of CXR interpretation.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 4","pages":"e0000813"},"PeriodicalIF":7.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11999130/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
New versions of computer-aided detection (CAD) software for chest X-ray (CXR) interpretation during tuberculosis (TB) screening are regularly released which purport to have incremental performance gains. No studies have independently assessed differences in software performance between the World Health Organization recommended INSIGHT CXR software (Lunit, South Korea). A well-characterized Digital Imaging and Communications in Medicine (DICOM) test library was compiled using data from a community-based TB screening initiative in Ho Chi Minh City, Viet Nam. The performance of Lunit CAD software versions 3.1.0.0 and 3.9.0.1 (newer version) were compared by measuring the area under the receiver operating characteristic curve (AUC), stratified by key clinical and demographic variables and using Xpert MTB/RIF Ultra (Ultra) test results as the reference standard. Median abnormality scores were compared using the Wilcoxon signed-rank test and performance characteristics were compared at clinically-relevant cut-off thresholds (e.g., 90% sensitivity) between the versions. The DICOM test library contained 2,708 participants, of whom 10.3% had a Mycobacterium tuberculosis (MTB) positive Ultra test result. The newer software version had a significantly higher AUC than its predecessor (AUC 0.76 vs 0.78, p = 0.029), and performed significantly better among people with a past history of TB (AUC 0.67 vs 0.73, p = 0.003), older individuals (0.75 vs 0.77, p = 0.040) and males (0.73 vs 0.76, p = 0.008). When using an cut-off threshold optimized for the older software version, the newer software was significantly less accurate than its predecessors. However, when the cut-off threshold was re-calibrated, there were no significant differences in sensitivity and specificity between the software versions. Although INSIGHT CXR v3.9.0.1 has some significantly improved performance characteristics compared to its predecessor, further studies should assess how these performance differences translate into real-world improvements during TB screening. As new CAD software versions are rolled out, cut-off thresholds must be re-calibrated to ensure the continued accuracy of CXR interpretation.
定期发布用于结核病筛查期间胸部x射线(CXR)解释的计算机辅助检测(CAD)软件的新版本,据称具有渐进式性能增益。没有研究独立评估世界卫生组织推荐的INSIGHT CXR软件(Lunit,韩国)之间的软件性能差异。利用来自越南胡志明市社区结核病筛查倡议的数据,编制了一个具有良好特征的医学数字成像和通信(DICOM)测试库。以Xpert MTB/RIF Ultra (Ultra)检测结果为参考标准,测量受试者工作特征曲线下面积(AUC),对Lunit CAD软件版本3.1.0.0和3.9.0.1(更新版本)的性能进行比较。使用Wilcoxon符号秩检验比较中位异常评分,并在两个版本之间的临床相关截止阈值(例如,90%灵敏度)比较性能特征。DICOM检测文库包含2708名参与者,其中10.3%的人有结核分枝杆菌(MTB) Ultra检测阳性结果。新软件版本的AUC显著高于其前身(AUC 0.76 vs 0.78, p = 0.029),并且在有结核病病史的人群(AUC 0.67 vs 0.73, p = 0.003)、老年人(0.75 vs 0.77, p = 0.040)和男性(0.73 vs 0.76, p = 0.008)中表现明显更好。当使用针对旧软件版本优化的截止阈值时,新软件的准确性明显低于其前身。然而,当重新校准截止阈值时,软件版本之间的敏感性和特异性没有显着差异。尽管INSIGHT CXR v3.9.0.1与其前身相比有一些显著改进的性能特征,但进一步的研究应评估这些性能差异如何在结核病筛查期间转化为实际改进。随着新的CAD软件版本的推出,必须重新校准截止阈值,以确保CXR解释的持续准确性。