Yuan Li, Joy Rivers, Saundra Mathis, Zhongya Li, Sopio Chochua, Benjamin J Metcalf, Bernard Beall, Lesley McGee
{"title":"美国侵袭性 A 群链球菌感染的基因组集群形成:一项全基因组测序和基于人群的监测研究。","authors":"Yuan Li, Joy Rivers, Saundra Mathis, Zhongya Li, Sopio Chochua, Benjamin J Metcalf, Bernard Beall, Lesley McGee","doi":"10.1016/S2666-5247(24)00169-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clusters of invasive group A streptococcal (iGAS) infection, linked to genomically closely related group A streptococcal (GAS) isolates (referred to as genomic clusters), pose public health threats, and are increasingly identified through whole-genome sequencing (WGS) analysis. In this study, we aimed to assess the risk of genomic cluster formation among iGAS cases not already part of existing genomic clusters.</p><p><strong>Methods: </strong>In this WGS and population-based surveillance study, we analysed iGAS case isolates from the Active Bacterial Core surveillance (ABCs), which is part of the US Centers for Disease Control and Prevention's Emerging Infections Program, in ten US states from Jan 1, 2015, to Dec 31, 2019. We included all residents in ABCs sites with iGAS infections meeting the case definition and excluded non-conforming GAS infections and cases with whole-genome assemblies of the isolate containing fewer than 1·5 million total bases or more than 150 contigs. For iGAS cases we collected basic demographics, underlying conditions, and risk factors for infection from medical records, and for isolates we included emm types, antimicrobial resistance, and presence of virulence-related genes. Two iGAS cases were defined as genomically clustered if their isolates differed by three or less single-nucleotide variants. An iGAS case not clustered with any previous cases at the time of detection, with a minimum trace-back time of 1 year, was defined as being at risk of cluster formation. We monitored each iGAS case at risk for a minimum of 1 year to identify any cluster formation event, defined as the detection of a subsequent iGAS case clustered with the case at risk. We used the Kaplan-Meier method to estimate the cumulative incidence of cluster formation events over time. We used Cox regression to assess associations between features of cases at risk upon detection and subsequent cluster formation. We developed a random survival forest machine-learning model based on a derivation cohort (random selection of 50% of cases at risk) to predict cluster formation risk. This model was validated using a validation cohort consisting of the remaining 50% of cases at risk.</p><p><strong>Findings: </strong>We identified 2764 iGAS cases at risk from 2016 to 2018, of which 656 (24%) formed genomic clusters by the end of 2019. Overall, the cumulative incidence of cluster formation was 0·057 (95% CI 0·048-0·066) at 30 days after detection, 0·12 (0·11-0·13) at 90 days after detection, and 0·16 (0·15-0·18) at 180 days after detection. A higher risk of cluster formation was associated with emm type (adjusted hazard ratio as compared with emm89 was 2·37 [95% CI 1·71-3·30] for emm1, 2·72 [1·82-4·06] for emm3, 2·28 [1·49-3·51] for emm6, 1·47 [1·05-2·06] for emm12, and 2·21 [1·38-3·56] for emm92), homelessness (1·42 [1·01-1·99]), injection drug use (2·08 [1·59-2·72]), residence in a long-term care facility (1·78 [1·29-2·45]), and the autumn-winter season (1·34 [1·14-1·57]) in multivariable Cox regression analysis. The machine-learning model stratified the validation cohort (n=1382) into groups at low (n=370), moderate (n=738), and high (n=274) risk. The 90-day risk of cluster formation was 0·03 (95% CI 0·01-0·05) for the group at low risk, 0·10 (0·08-0·13) for the group at moderate risk, and 0·21 (0·17-0·25) for the group at high risk. These results were consistent with the cross-validation outcomes in the derivation cohort.</p><p><strong>Interpretation: </strong>Using population-based surveillance data, we found that pathogen, host, and environment factors of iGAS cases were associated with increased likelihood of subsequent genomic cluster formation. Groups at high risk were consistently identified by a predictive model which could inform prevention strategies, although future work to refine the model, incorporating other potential risk factors such as host contact patterns and immunity to GAS, is needed to improve its predictive performance.</p><p><strong>Funding: </strong>Centers for Disease Control and Prevention.</p>","PeriodicalId":46633,"journal":{"name":"Lancet Microbe","volume":null,"pages":null},"PeriodicalIF":20.9000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Genomic cluster formation among invasive group A streptococcal infections in the USA: a whole-genome sequencing and population-based surveillance study.\",\"authors\":\"Yuan Li, Joy Rivers, Saundra Mathis, Zhongya Li, Sopio Chochua, Benjamin J Metcalf, Bernard Beall, Lesley McGee\",\"doi\":\"10.1016/S2666-5247(24)00169-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Clusters of invasive group A streptococcal (iGAS) infection, linked to genomically closely related group A streptococcal (GAS) isolates (referred to as genomic clusters), pose public health threats, and are increasingly identified through whole-genome sequencing (WGS) analysis. In this study, we aimed to assess the risk of genomic cluster formation among iGAS cases not already part of existing genomic clusters.</p><p><strong>Methods: </strong>In this WGS and population-based surveillance study, we analysed iGAS case isolates from the Active Bacterial Core surveillance (ABCs), which is part of the US Centers for Disease Control and Prevention's Emerging Infections Program, in ten US states from Jan 1, 2015, to Dec 31, 2019. We included all residents in ABCs sites with iGAS infections meeting the case definition and excluded non-conforming GAS infections and cases with whole-genome assemblies of the isolate containing fewer than 1·5 million total bases or more than 150 contigs. For iGAS cases we collected basic demographics, underlying conditions, and risk factors for infection from medical records, and for isolates we included emm types, antimicrobial resistance, and presence of virulence-related genes. Two iGAS cases were defined as genomically clustered if their isolates differed by three or less single-nucleotide variants. An iGAS case not clustered with any previous cases at the time of detection, with a minimum trace-back time of 1 year, was defined as being at risk of cluster formation. We monitored each iGAS case at risk for a minimum of 1 year to identify any cluster formation event, defined as the detection of a subsequent iGAS case clustered with the case at risk. We used the Kaplan-Meier method to estimate the cumulative incidence of cluster formation events over time. We used Cox regression to assess associations between features of cases at risk upon detection and subsequent cluster formation. We developed a random survival forest machine-learning model based on a derivation cohort (random selection of 50% of cases at risk) to predict cluster formation risk. This model was validated using a validation cohort consisting of the remaining 50% of cases at risk.</p><p><strong>Findings: </strong>We identified 2764 iGAS cases at risk from 2016 to 2018, of which 656 (24%) formed genomic clusters by the end of 2019. Overall, the cumulative incidence of cluster formation was 0·057 (95% CI 0·048-0·066) at 30 days after detection, 0·12 (0·11-0·13) at 90 days after detection, and 0·16 (0·15-0·18) at 180 days after detection. A higher risk of cluster formation was associated with emm type (adjusted hazard ratio as compared with emm89 was 2·37 [95% CI 1·71-3·30] for emm1, 2·72 [1·82-4·06] for emm3, 2·28 [1·49-3·51] for emm6, 1·47 [1·05-2·06] for emm12, and 2·21 [1·38-3·56] for emm92), homelessness (1·42 [1·01-1·99]), injection drug use (2·08 [1·59-2·72]), residence in a long-term care facility (1·78 [1·29-2·45]), and the autumn-winter season (1·34 [1·14-1·57]) in multivariable Cox regression analysis. The machine-learning model stratified the validation cohort (n=1382) into groups at low (n=370), moderate (n=738), and high (n=274) risk. The 90-day risk of cluster formation was 0·03 (95% CI 0·01-0·05) for the group at low risk, 0·10 (0·08-0·13) for the group at moderate risk, and 0·21 (0·17-0·25) for the group at high risk. These results were consistent with the cross-validation outcomes in the derivation cohort.</p><p><strong>Interpretation: </strong>Using population-based surveillance data, we found that pathogen, host, and environment factors of iGAS cases were associated with increased likelihood of subsequent genomic cluster formation. Groups at high risk were consistently identified by a predictive model which could inform prevention strategies, although future work to refine the model, incorporating other potential risk factors such as host contact patterns and immunity to GAS, is needed to improve its predictive performance.</p><p><strong>Funding: </strong>Centers for Disease Control and Prevention.</p>\",\"PeriodicalId\":46633,\"journal\":{\"name\":\"Lancet Microbe\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":20.9000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Microbe\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/S2666-5247(24)00169-1\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Microbe","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/S2666-5247(24)00169-1","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
摘要
背景:侵袭性 A 组链球菌(iGAS)感染集群与基因组学上密切相关的 A 组链球菌(GAS)分离株相关联(称为基因组集群),对公共卫生构成威胁,并且越来越多地通过全基因组测序(WGS)分析进行鉴定。在本研究中,我们旨在评估尚未加入现有基因组集群的 iGAS 病例形成基因组集群的风险:在这项基于 WGS 和人群监测的研究中,我们分析了 2015 年 1 月 1 日至 2019 年 12 月 31 日期间美国十个州主动细菌核心监测(ABCs)中的 iGAS 病例分离物,该监测是美国疾病控制与预防中心新发感染项目的一部分。我们纳入了ABCs站点中所有符合病例定义的iGAS感染居民,并排除了不符合GAS感染和分离物全基因组组装总碱基少于100-500万或等位基因超过150个的病例。对于 iGAS 病例,我们从医疗记录中收集了基本的人口统计学特征、基础病症和感染风险因素;对于分离物,我们收集了 emm 类型、抗菌药耐药性和毒力相关基因的存在情况。如果两个 iGAS 病例的分离物存在三个或更少的单核苷酸变异,则将其定义为基因组集群病例。如果一个 iGAS 病例在检测时未与之前的任何病例发生聚类,且追溯时间至少为 1 年,则被定义为有形成聚类的风险。我们对每个有风险的 iGAS 病例进行了至少 1 年的监测,以确定是否有集群形成事件,集群形成事件的定义是随后检测到的 iGAS 病例与有风险的病例聚集在一起。我们使用 Kaplan-Meier 法估算了随着时间推移集群形成事件的累积发生率。我们使用 Cox 回归法来评估高危病例检测特征与后续集群形成之间的关联。我们在衍生队列(随机选择 50%的高危病例)的基础上开发了一个随机生存森林机器学习模型,用于预测集群形成风险。该模型通过由剩余 50% 高危病例组成的验证队列进行了验证:从2016年到2018年,我们共发现了2764例iGAS高危病例,其中656例(24%)在2019年底前形成了基因组集群。总体而言,集群形成的累积发生率在检测后30天为0-057(95% CI 0-048-066),检测后90天为0-12(0-11-0-13),检测后180天为0-16(0-15-0-18)。集群形成的较高风险与 emm 类型有关(与 emm89 相比,emm1 的调整危险比为 2-37 [95% CI 1-71-3-30],emm3 为 2-72 [1-82-4-06],emm6 为 2-28 [1-49-3-51],emm12 为 1-47 [1-05-2-06],emm3 为 2-21 [1-38-3-06])、在多变量 Cox 回归分析中,还包括无家可归者(1-42 [1-01-1-99])、注射吸毒(2-08 [1-59-2-72])、居住在长期护理机构(1-78 [1-29-2-45])和秋冬季节(1-34 [1-14-1-57])。机器学习模型将验证队列(n=1382)分为低风险组(n=370)、中度风险组(n=738)和高风险组(n=274)。低风险组的 90 天集群形成风险为 0-03(95% CI 0-01-0-05),中度风险组为 0-10(0-08-0-13),高度风险组为 0-21(0-17-0-25)。这些结果与衍生队列的交叉验证结果一致:利用基于人群的监测数据,我们发现 iGAS 病例的病原体、宿主和环境因素与后续基因组集群形成的可能性增加有关。预测模型可以持续识别高风险群体,为预防策略提供依据,但未来还需要改进模型,纳入宿主接触模式和对 GAS 的免疫力等其他潜在风险因素,以提高其预测性能:美国疾病控制和预防中心。
Genomic cluster formation among invasive group A streptococcal infections in the USA: a whole-genome sequencing and population-based surveillance study.
Background: Clusters of invasive group A streptococcal (iGAS) infection, linked to genomically closely related group A streptococcal (GAS) isolates (referred to as genomic clusters), pose public health threats, and are increasingly identified through whole-genome sequencing (WGS) analysis. In this study, we aimed to assess the risk of genomic cluster formation among iGAS cases not already part of existing genomic clusters.
Methods: In this WGS and population-based surveillance study, we analysed iGAS case isolates from the Active Bacterial Core surveillance (ABCs), which is part of the US Centers for Disease Control and Prevention's Emerging Infections Program, in ten US states from Jan 1, 2015, to Dec 31, 2019. We included all residents in ABCs sites with iGAS infections meeting the case definition and excluded non-conforming GAS infections and cases with whole-genome assemblies of the isolate containing fewer than 1·5 million total bases or more than 150 contigs. For iGAS cases we collected basic demographics, underlying conditions, and risk factors for infection from medical records, and for isolates we included emm types, antimicrobial resistance, and presence of virulence-related genes. Two iGAS cases were defined as genomically clustered if their isolates differed by three or less single-nucleotide variants. An iGAS case not clustered with any previous cases at the time of detection, with a minimum trace-back time of 1 year, was defined as being at risk of cluster formation. We monitored each iGAS case at risk for a minimum of 1 year to identify any cluster formation event, defined as the detection of a subsequent iGAS case clustered with the case at risk. We used the Kaplan-Meier method to estimate the cumulative incidence of cluster formation events over time. We used Cox regression to assess associations between features of cases at risk upon detection and subsequent cluster formation. We developed a random survival forest machine-learning model based on a derivation cohort (random selection of 50% of cases at risk) to predict cluster formation risk. This model was validated using a validation cohort consisting of the remaining 50% of cases at risk.
Findings: We identified 2764 iGAS cases at risk from 2016 to 2018, of which 656 (24%) formed genomic clusters by the end of 2019. Overall, the cumulative incidence of cluster formation was 0·057 (95% CI 0·048-0·066) at 30 days after detection, 0·12 (0·11-0·13) at 90 days after detection, and 0·16 (0·15-0·18) at 180 days after detection. A higher risk of cluster formation was associated with emm type (adjusted hazard ratio as compared with emm89 was 2·37 [95% CI 1·71-3·30] for emm1, 2·72 [1·82-4·06] for emm3, 2·28 [1·49-3·51] for emm6, 1·47 [1·05-2·06] for emm12, and 2·21 [1·38-3·56] for emm92), homelessness (1·42 [1·01-1·99]), injection drug use (2·08 [1·59-2·72]), residence in a long-term care facility (1·78 [1·29-2·45]), and the autumn-winter season (1·34 [1·14-1·57]) in multivariable Cox regression analysis. The machine-learning model stratified the validation cohort (n=1382) into groups at low (n=370), moderate (n=738), and high (n=274) risk. The 90-day risk of cluster formation was 0·03 (95% CI 0·01-0·05) for the group at low risk, 0·10 (0·08-0·13) for the group at moderate risk, and 0·21 (0·17-0·25) for the group at high risk. These results were consistent with the cross-validation outcomes in the derivation cohort.
Interpretation: Using population-based surveillance data, we found that pathogen, host, and environment factors of iGAS cases were associated with increased likelihood of subsequent genomic cluster formation. Groups at high risk were consistently identified by a predictive model which could inform prevention strategies, although future work to refine the model, incorporating other potential risk factors such as host contact patterns and immunity to GAS, is needed to improve its predictive performance.
Funding: Centers for Disease Control and Prevention.
期刊介绍:
The Lancet Microbe is a gold open access journal committed to publishing content relevant to clinical microbiologists worldwide, with a focus on studies that advance clinical understanding, challenge the status quo, and advocate change in health policy.