Genomic cluster formation among invasive group A streptococcal infections in the USA: a whole-genome sequencing and population-based surveillance study.
Yuan Li, Joy Rivers, Saundra Mathis, Zhongya Li, Sopio Chochua, Benjamin J Metcalf, Bernard Beall, Lesley McGee
{"title":"Genomic cluster formation among invasive group A streptococcal infections in the USA: a whole-genome sequencing and population-based surveillance study.","authors":"Yuan Li, Joy Rivers, Saundra Mathis, Zhongya Li, Sopio Chochua, Benjamin J Metcalf, Bernard Beall, Lesley McGee","doi":"10.1016/S2666-5247(24)00169-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clusters of invasive group A streptococcal (iGAS) infection, linked to genomically closely related group A streptococcal (GAS) isolates (referred to as genomic clusters), pose public health threats, and are increasingly identified through whole-genome sequencing (WGS) analysis. In this study, we aimed to assess the risk of genomic cluster formation among iGAS cases not already part of existing genomic clusters.</p><p><strong>Methods: </strong>In this WGS and population-based surveillance study, we analysed iGAS case isolates from the Active Bacterial Core surveillance (ABCs), which is part of the US Centers for Disease Control and Prevention's Emerging Infections Program, in ten US states from Jan 1, 2015, to Dec 31, 2019. We included all residents in ABCs sites with iGAS infections meeting the case definition and excluded non-conforming GAS infections and cases with whole-genome assemblies of the isolate containing fewer than 1·5 million total bases or more than 150 contigs. For iGAS cases we collected basic demographics, underlying conditions, and risk factors for infection from medical records, and for isolates we included emm types, antimicrobial resistance, and presence of virulence-related genes. Two iGAS cases were defined as genomically clustered if their isolates differed by three or less single-nucleotide variants. An iGAS case not clustered with any previous cases at the time of detection, with a minimum trace-back time of 1 year, was defined as being at risk of cluster formation. We monitored each iGAS case at risk for a minimum of 1 year to identify any cluster formation event, defined as the detection of a subsequent iGAS case clustered with the case at risk. We used the Kaplan-Meier method to estimate the cumulative incidence of cluster formation events over time. We used Cox regression to assess associations between features of cases at risk upon detection and subsequent cluster formation. We developed a random survival forest machine-learning model based on a derivation cohort (random selection of 50% of cases at risk) to predict cluster formation risk. This model was validated using a validation cohort consisting of the remaining 50% of cases at risk.</p><p><strong>Findings: </strong>We identified 2764 iGAS cases at risk from 2016 to 2018, of which 656 (24%) formed genomic clusters by the end of 2019. Overall, the cumulative incidence of cluster formation was 0·057 (95% CI 0·048-0·066) at 30 days after detection, 0·12 (0·11-0·13) at 90 days after detection, and 0·16 (0·15-0·18) at 180 days after detection. A higher risk of cluster formation was associated with emm type (adjusted hazard ratio as compared with emm89 was 2·37 [95% CI 1·71-3·30] for emm1, 2·72 [1·82-4·06] for emm3, 2·28 [1·49-3·51] for emm6, 1·47 [1·05-2·06] for emm12, and 2·21 [1·38-3·56] for emm92), homelessness (1·42 [1·01-1·99]), injection drug use (2·08 [1·59-2·72]), residence in a long-term care facility (1·78 [1·29-2·45]), and the autumn-winter season (1·34 [1·14-1·57]) in multivariable Cox regression analysis. The machine-learning model stratified the validation cohort (n=1382) into groups at low (n=370), moderate (n=738), and high (n=274) risk. The 90-day risk of cluster formation was 0·03 (95% CI 0·01-0·05) for the group at low risk, 0·10 (0·08-0·13) for the group at moderate risk, and 0·21 (0·17-0·25) for the group at high risk. These results were consistent with the cross-validation outcomes in the derivation cohort.</p><p><strong>Interpretation: </strong>Using population-based surveillance data, we found that pathogen, host, and environment factors of iGAS cases were associated with increased likelihood of subsequent genomic cluster formation. Groups at high risk were consistently identified by a predictive model which could inform prevention strategies, although future work to refine the model, incorporating other potential risk factors such as host contact patterns and immunity to GAS, is needed to improve its predictive performance.</p><p><strong>Funding: </strong>Centers for Disease Control and Prevention.</p>","PeriodicalId":46633,"journal":{"name":"Lancet Microbe","volume":" ","pages":"100927"},"PeriodicalIF":20.9000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Microbe","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/S2666-5247(24)00169-1","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Clusters of invasive group A streptococcal (iGAS) infection, linked to genomically closely related group A streptococcal (GAS) isolates (referred to as genomic clusters), pose public health threats, and are increasingly identified through whole-genome sequencing (WGS) analysis. In this study, we aimed to assess the risk of genomic cluster formation among iGAS cases not already part of existing genomic clusters.
Methods: In this WGS and population-based surveillance study, we analysed iGAS case isolates from the Active Bacterial Core surveillance (ABCs), which is part of the US Centers for Disease Control and Prevention's Emerging Infections Program, in ten US states from Jan 1, 2015, to Dec 31, 2019. We included all residents in ABCs sites with iGAS infections meeting the case definition and excluded non-conforming GAS infections and cases with whole-genome assemblies of the isolate containing fewer than 1·5 million total bases or more than 150 contigs. For iGAS cases we collected basic demographics, underlying conditions, and risk factors for infection from medical records, and for isolates we included emm types, antimicrobial resistance, and presence of virulence-related genes. Two iGAS cases were defined as genomically clustered if their isolates differed by three or less single-nucleotide variants. An iGAS case not clustered with any previous cases at the time of detection, with a minimum trace-back time of 1 year, was defined as being at risk of cluster formation. We monitored each iGAS case at risk for a minimum of 1 year to identify any cluster formation event, defined as the detection of a subsequent iGAS case clustered with the case at risk. We used the Kaplan-Meier method to estimate the cumulative incidence of cluster formation events over time. We used Cox regression to assess associations between features of cases at risk upon detection and subsequent cluster formation. We developed a random survival forest machine-learning model based on a derivation cohort (random selection of 50% of cases at risk) to predict cluster formation risk. This model was validated using a validation cohort consisting of the remaining 50% of cases at risk.
Findings: We identified 2764 iGAS cases at risk from 2016 to 2018, of which 656 (24%) formed genomic clusters by the end of 2019. Overall, the cumulative incidence of cluster formation was 0·057 (95% CI 0·048-0·066) at 30 days after detection, 0·12 (0·11-0·13) at 90 days after detection, and 0·16 (0·15-0·18) at 180 days after detection. A higher risk of cluster formation was associated with emm type (adjusted hazard ratio as compared with emm89 was 2·37 [95% CI 1·71-3·30] for emm1, 2·72 [1·82-4·06] for emm3, 2·28 [1·49-3·51] for emm6, 1·47 [1·05-2·06] for emm12, and 2·21 [1·38-3·56] for emm92), homelessness (1·42 [1·01-1·99]), injection drug use (2·08 [1·59-2·72]), residence in a long-term care facility (1·78 [1·29-2·45]), and the autumn-winter season (1·34 [1·14-1·57]) in multivariable Cox regression analysis. The machine-learning model stratified the validation cohort (n=1382) into groups at low (n=370), moderate (n=738), and high (n=274) risk. The 90-day risk of cluster formation was 0·03 (95% CI 0·01-0·05) for the group at low risk, 0·10 (0·08-0·13) for the group at moderate risk, and 0·21 (0·17-0·25) for the group at high risk. These results were consistent with the cross-validation outcomes in the derivation cohort.
Interpretation: Using population-based surveillance data, we found that pathogen, host, and environment factors of iGAS cases were associated with increased likelihood of subsequent genomic cluster formation. Groups at high risk were consistently identified by a predictive model which could inform prevention strategies, although future work to refine the model, incorporating other potential risk factors such as host contact patterns and immunity to GAS, is needed to improve its predictive performance.
Funding: Centers for Disease Control and Prevention.
期刊介绍:
The Lancet Microbe is a gold open access journal committed to publishing content relevant to clinical microbiologists worldwide, with a focus on studies that advance clinical understanding, challenge the status quo, and advocate change in health policy.