Comprehensive metabolomics combined with machine learning for the identification of SARS-CoV-2 and other viruses directly from upper respiratory samples.
Catherine A Hogan, Anthony T Le, Afraz Khan, LingHui David Su, ChunHong Huang, Malaya K Sahoo, Chieh-Wen Lo, Marwah Karim, Karin Ann Stein, Shirit Einav, Tina M Cowan, Benjamin A Pinsky
{"title":"Comprehensive metabolomics combined with machine learning for the identification of SARS-CoV-2 and other viruses directly from upper respiratory samples.","authors":"Catherine A Hogan, Anthony T Le, Afraz Khan, LingHui David Su, ChunHong Huang, Malaya K Sahoo, Chieh-Wen Lo, Marwah Karim, Karin Ann Stein, Shirit Einav, Tina M Cowan, Benjamin A Pinsky","doi":"10.1128/jcm.02042-24","DOIUrl":null,"url":null,"abstract":"<p><p>Metabolic profiling of respiratory samples from individuals infected and uninfected with respiratory viral infections may identify biomarker signatures that complement routine clinical diagnostic testing and offer unique insights into pathophysiology. We used liquid chromatography quadrupole time-of-flight mass spectrometry to generate untargeted metabolomic profiles and identified top biomarker signatures differentiating severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) positive from negative samples via machine learning. We then adapted these signatures to liquid chromatography-tandem mass spectrometry for targeted profiling and assessed classification performance, including samples positive for other respiratory viruses and negative for viral testing. A total of 1,226 samples were tested, including 521 positive samples for SARS-CoV-2, 97 for influenza A, 96 for respiratory syncytial virus (RSV), 211 for other respiratory viruses, and 301 negative samples. The top-performing model was the Light Gradient Boosting Model, which showed an area under the receiver operating characteristic curve (AUC) of 0.99 (95% confidence interval [CI], 0.99-1.00), sensitivity of 0.96 (95% CI, 0.91-0.99), and specificity of 0.95 (95% CI, 0.90-0.97). A separate machine learning analysis investigating the performance by viral subtype showed high performance for the identification of influenza A virus with an AUC of 0.97 (95% CI, 0.94-0.99) and RSV with an AUC of 0.99 (95% CI, 0.97-1.00). The two features with the highest ranking were identified as 3-oxo-heneicosanoic acid and 2-(4-hydroxyphenyl) ethanol. These findings extend our understanding of the metabolic impact of respiratory viral infections and support the potential of metabolomics to complement routine clinical diagnostic methods.IMPORTANCEMolecular testing has greatly improved how viruses are diagnosed; however, gaps remain, including limited sensitivity directly from specimens and inability to differentiate active from resolved infection. In this study, we investigated the use of a distinct diagnostic approach, mass spectrometry for detection of metabolites (small molecules) combined with machine learning analysis, for the diagnosis of SARS-CoV-2 and other respiratory viruses. We demonstrated strong performance of this approach directly from upper respiratory swab samples to differentiate SARS-CoV-2-infected versus uninfected individuals. Extension of this approach to influenza and RSV maintained a high level of performance. This research suggests that mass spectrometry-based infectious disease diagnostic testing has clinical potential and that these metabolomic features may reveal novel host-pathogen interactions and therapeutic targets. Applying a similar approach to prospective, multisite cohorts of patients with other infectious diseases carries potential to extend our understanding of the metabolic pathways involved in the host response to infection.</p>","PeriodicalId":15511,"journal":{"name":"Journal of Clinical Microbiology","volume":" ","pages":"e0204224"},"PeriodicalIF":5.4000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Microbiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1128/jcm.02042-24","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Metabolic profiling of respiratory samples from individuals infected and uninfected with respiratory viral infections may identify biomarker signatures that complement routine clinical diagnostic testing and offer unique insights into pathophysiology. We used liquid chromatography quadrupole time-of-flight mass spectrometry to generate untargeted metabolomic profiles and identified top biomarker signatures differentiating severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) positive from negative samples via machine learning. We then adapted these signatures to liquid chromatography-tandem mass spectrometry for targeted profiling and assessed classification performance, including samples positive for other respiratory viruses and negative for viral testing. A total of 1,226 samples were tested, including 521 positive samples for SARS-CoV-2, 97 for influenza A, 96 for respiratory syncytial virus (RSV), 211 for other respiratory viruses, and 301 negative samples. The top-performing model was the Light Gradient Boosting Model, which showed an area under the receiver operating characteristic curve (AUC) of 0.99 (95% confidence interval [CI], 0.99-1.00), sensitivity of 0.96 (95% CI, 0.91-0.99), and specificity of 0.95 (95% CI, 0.90-0.97). A separate machine learning analysis investigating the performance by viral subtype showed high performance for the identification of influenza A virus with an AUC of 0.97 (95% CI, 0.94-0.99) and RSV with an AUC of 0.99 (95% CI, 0.97-1.00). The two features with the highest ranking were identified as 3-oxo-heneicosanoic acid and 2-(4-hydroxyphenyl) ethanol. These findings extend our understanding of the metabolic impact of respiratory viral infections and support the potential of metabolomics to complement routine clinical diagnostic methods.IMPORTANCEMolecular testing has greatly improved how viruses are diagnosed; however, gaps remain, including limited sensitivity directly from specimens and inability to differentiate active from resolved infection. In this study, we investigated the use of a distinct diagnostic approach, mass spectrometry for detection of metabolites (small molecules) combined with machine learning analysis, for the diagnosis of SARS-CoV-2 and other respiratory viruses. We demonstrated strong performance of this approach directly from upper respiratory swab samples to differentiate SARS-CoV-2-infected versus uninfected individuals. Extension of this approach to influenza and RSV maintained a high level of performance. This research suggests that mass spectrometry-based infectious disease diagnostic testing has clinical potential and that these metabolomic features may reveal novel host-pathogen interactions and therapeutic targets. Applying a similar approach to prospective, multisite cohorts of patients with other infectious diseases carries potential to extend our understanding of the metabolic pathways involved in the host response to infection.
期刊介绍:
The Journal of Clinical Microbiology® disseminates the latest research concerning the laboratory diagnosis of human and animal infections, along with the laboratory's role in epidemiology and the management of infectious diseases.