Combining Radiological and Genomic TB Portals Data for Drug Resistance Analysis

IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Vy C. B. Bui;Ziv Yaniv;Michael Harris;Feng Yang;Karthik Kantipudi;Darrell Hurt;Alex Rosenthal;Stefan Jaeger
{"title":"Combining Radiological and Genomic TB Portals Data for Drug Resistance Analysis","authors":"Vy C. B. Bui;Ziv Yaniv;Michael Harris;Feng Yang;Karthik Kantipudi;Darrell Hurt;Alex Rosenthal;Stefan Jaeger","doi":"10.1109/ACCESS.2023.3298750","DOIUrl":null,"url":null,"abstract":"Tuberculosis (TB) drug resistance is a worldwide public health problem. It decreases the likelihood of a positive outcome for the individual patient and increases the likelihood of disease spread. Therefore, early detection of TB drug resistance is crucial for improving outcomes and controlling disease transmission. While drug-sensitive tuberculosis cases are declining worldwide because of effective treatment, the threat of drug-resistant tuberculosis is growing, and the success rate of drug-resistant tuberculosis treatment is only around 60%. The TB Portals program provides a publicly accessible repository of TB case data with an emphasis on collecting drug-resistant cases. The dataset includes multi-modal information such as socioeconomic/geographic data, clinical characteristics, pathogen genomics, and radiological features. The program is an international collaboration whose participants are typically under a substantial burden of drug-resistant tuberculosis, with data collected from standard clinical care provided to the patients. Consequentially, the TB Portals dataset is heterogenous in nature, with data representing multiple treatment centers in different countries and containing cross-domain information. This study presents the challenges and methods used to address them when working with this real-world dataset. Our goal was to evaluate whether combining radiological features derived from a chest X-ray of the host and genomic features from the pathogen can potentially improve the identification of the drug susceptibility type, drug-sensitive (DS-TB) or drug-resistant (DR-TB), and the length of the first successful drug regimen. To perform these studies, significantly imbalanced data needed to be processed, which included a much larger number of DR-TB cases than DS-TB, many more cases with radiological findings than genomic ones, and the sparse high dimensional nature of the genomic information. Three evaluation studies were carried out. First, the DR-TB/DS-TB classification model achieved an average accuracy of 92.4% when using genomic features alone or when combining radiological and genomic features. Second, the regression model for the length of the first successful treatment had a relative error of 53.5% using radiological features, 25.6% using genomic features, and 22.0% using both radiological and genomic features. Finally, the relative error of the third regression model predicting the length of the first treatment using the most common drug combination varied depending on the feature type used. When using radiological features alone, the relative error was 17.8%. For geno- mic features alone, the relative error increased to 19.9%. The model had a relative error of 19.0% when both radiological and genomic features were combined. Although combining radiological and genomic features did not improve upon the use of genomic features when classifying DR-TB/DS-TB, the combination of the two feature types improved the relative error of the predictive model for the length of the first successful treatment. Furthermore, the regression model trained on radiological features achieved the best performance when predicting the treatment length of the most common drug combination.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"11 ","pages":"84228-84240"},"PeriodicalIF":3.4000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/72/e4/nihms-1924913.PMC10473876.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10194254/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Tuberculosis (TB) drug resistance is a worldwide public health problem. It decreases the likelihood of a positive outcome for the individual patient and increases the likelihood of disease spread. Therefore, early detection of TB drug resistance is crucial for improving outcomes and controlling disease transmission. While drug-sensitive tuberculosis cases are declining worldwide because of effective treatment, the threat of drug-resistant tuberculosis is growing, and the success rate of drug-resistant tuberculosis treatment is only around 60%. The TB Portals program provides a publicly accessible repository of TB case data with an emphasis on collecting drug-resistant cases. The dataset includes multi-modal information such as socioeconomic/geographic data, clinical characteristics, pathogen genomics, and radiological features. The program is an international collaboration whose participants are typically under a substantial burden of drug-resistant tuberculosis, with data collected from standard clinical care provided to the patients. Consequentially, the TB Portals dataset is heterogenous in nature, with data representing multiple treatment centers in different countries and containing cross-domain information. This study presents the challenges and methods used to address them when working with this real-world dataset. Our goal was to evaluate whether combining radiological features derived from a chest X-ray of the host and genomic features from the pathogen can potentially improve the identification of the drug susceptibility type, drug-sensitive (DS-TB) or drug-resistant (DR-TB), and the length of the first successful drug regimen. To perform these studies, significantly imbalanced data needed to be processed, which included a much larger number of DR-TB cases than DS-TB, many more cases with radiological findings than genomic ones, and the sparse high dimensional nature of the genomic information. Three evaluation studies were carried out. First, the DR-TB/DS-TB classification model achieved an average accuracy of 92.4% when using genomic features alone or when combining radiological and genomic features. Second, the regression model for the length of the first successful treatment had a relative error of 53.5% using radiological features, 25.6% using genomic features, and 22.0% using both radiological and genomic features. Finally, the relative error of the third regression model predicting the length of the first treatment using the most common drug combination varied depending on the feature type used. When using radiological features alone, the relative error was 17.8%. For geno- mic features alone, the relative error increased to 19.9%. The model had a relative error of 19.0% when both radiological and genomic features were combined. Although combining radiological and genomic features did not improve upon the use of genomic features when classifying DR-TB/DS-TB, the combination of the two feature types improved the relative error of the predictive model for the length of the first successful treatment. Furthermore, the regression model trained on radiological features achieved the best performance when predicting the treatment length of the most common drug combination.

Abstract Image

Abstract Image

Abstract Image

结合放射学和基因组学结核病门户数据进行耐药性分析
结核病耐药性是一个全球性的公共卫生问题。它降低了个体患者出现阳性结果的可能性,并增加了疾病传播的可能性。因此,早期发现结核病耐药性对于改善疗效和控制疾病传播至关重要。尽管由于有效的治疗,全球对药物敏感的结核病病例正在下降,但耐药结核病的威胁正在增加,耐药结核病治疗的成功率仅为60%左右。结核病门户项目提供了一个可公开访问的结核病病例数据库,重点是收集耐药病例。数据集包括多模式信息,如社会经济/地理数据、临床特征、病原体基因组学和放射学特征。该项目是一项国际合作项目,其参与者通常承受着耐药性结核病的巨大负担,从标准临床护理中收集的数据提供给患者。因此,TB Portals数据集本质上是异质的,数据代表不同国家的多个治疗中心,并包含跨领域信息。这项研究介绍了在使用这个真实世界的数据集时所面临的挑战和解决这些挑战的方法。我们的目标是评估将宿主胸部X光片的放射学特征和病原体的基因组特征相结合是否有可能改善药物敏感性类型、药物敏感性(DS-TB)或耐药性(DR-TB)的识别,以及第一个成功的药物方案的时间。为了进行这些研究,需要处理明显不平衡的数据,其中包括DR-TB病例比DS-TB多得多,放射学发现的病例比基因组发现的病例多得多,以及基因组信息的稀疏高维性质。进行了三项评估研究。首先,当单独使用基因组特征或结合放射学和基因组特征时,DR-TB/DS-TB分类模型的平均准确率达到92.4%。其次,使用放射学特征的第一次成功治疗时间长度的回归模型的相对误差为53.5%,使用基因组特征的回归模型为25.6%,使用放射学和基因组特征的模型为22.0%。最后,第三回归模型预测使用最常见药物组合的第一次治疗的时间长度的相对误差根据所使用的特征类型而变化。当单独使用放射学特征时,相对误差为17.8%。对于单独的基因组特征,相对误差增加到19.9%。当放射学和基因组特征相结合时,该模型的相对误差为19.0%。尽管在对DR-TB/DS-TB进行分类时,结合放射学和基因组特征并不能改善基因组特征的使用,但这两种特征类型的结合改善了第一次成功治疗时间长度预测模型的相对误差。此外,在预测最常见药物组合的治疗时间时,根据放射学特征训练的回归模型取得了最佳性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Access
IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍: IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信