Exploring uncatalogued genetic variation in antimicrobial resistance gene families in Escherichia coli: an observational analysis

IF 20.9 1区生物学 Q1 INFECTIOUS DISEASES

Lancet Microbe Pub Date : 2024-11-01 DOI:10.1016/S2666-5247(24)00152-6

Samuel Lipworth DPhil , Prof Derrick Crook FRCP , Prof A Sarah Walker PhD , Prof Tim Peto FRCP , Prof Nicole Stoesser DPhil

{"title":"Exploring uncatalogued genetic variation in antimicrobial resistance gene families in Escherichia coli: an observational analysis","authors":"Samuel Lipworth DPhil , Prof Derrick Crook FRCP , Prof A Sarah Walker PhD , Prof Tim Peto FRCP , Prof Nicole Stoesser DPhil","doi":"10.1016/S2666-5247(24)00152-6","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Antimicrobial resistance (AMR) in <em>Escherichia coli</em> is a global problem associated with substantial morbidity and mortality. AMR-associated genes are typically annotated based on similarity to variants in a curated reference database, with the implicit assumption that uncatalogued genetic variation within these is phenotypically unimportant. In this study, we evaluated the performance of the AMRFinder tool and, subsequently, the potential for discovering new AMR-associated gene families and characterising variation within existing ones to improve genotype-to-susceptibility phenotype predictions in <em>E coli</em>.</div></div><div><h3>Methods</h3><div>In this cross-sectional study of international genome sequence data, we assembled a global dataset of 9001 <em>E coli</em> sequences from five publicly available data collections predominantly deriving from human bloodstream infections from: Norway, Oxfordshire (UK), Thailand, the UK, and Sweden. 8555 of these sequences had linked antibiotic susceptibility data. Raw reads were assembled using Shovill and AMR genes (relevant to amoxicillin–clavulanic acid, ampicillin, ceftriaxone, ciprofloxacin, gentamicin, piperacillin–tazobactam, and trimethoprim) extracted using the National Center for Biotechnology Information AMRFinder tool (using both default and strict [100%] coverage and identity filters). We assessed the predictive value of the presence of these genes for predicting resistance or susceptibility against US Food and Drug Administration thresholds for major and very major errors. Mash was used to calculate the similarity between extracted genes using Jaccard distances. We empirically reclustered extracted gene sequences into AMR-associated gene families (≥70% match) and antibiotic-resistance genes (ARGs; 100% match) and categorised these according to their frequency in the dataset. Accumulation curves were simulated and correlations between gene frequency in the Oxfordshire and other datasets calculated using the Spearman coefficient. Firth regression was used to model the association between the presence of <em>bla</em><sub>TEM-1</sub> variants and amoxicillin–clavulanic acid or piperacillin–tazobactam resistance, adjusted for the presence of other relevant ARGs.</div></div><div><h3>Findings</h3><div>The performance of the AMRFinder database for genotype-to-phenotype predictions using strict 100% identity and coverage thresholds did not meet US Food and Drug Administration thresholds for any of the seven antibiotics evaluated. Relaxing filters to default settings improved sensitivity with a specificity cost. For all antibiotics, most explainable resistance was associated with the presence of a small number of genes. There was a proportion of resistance that could not be explained by known ARGs; this ranged from 75·1% for amoxicillin–clavulanic acid to 3·4% for ciprofloxacin. Only 18 199 (51·5%) of the 35 343 ARGs detected had a 100% identity and coverage match in the AMRFinder database. After empirically reclassifying genes at 100% nucleotide sequence identity, we identified 1042 unique ARGs, of which 126 (12·1%) were present ten times or more, 313 (30·0%) were present between two and nine times, and 603 (57·9%) were present only once. Simulated accumulation curves revealed that discovery of new (100% match) ARGs present more than once in the dataset plateaued relatively quickly, whereas new singleton ARGs were discovered even after many thousands of isolates had been included. We identified a strong correlation (Spearman coefficient 0·76 [95% CI 0·73–0·80], p<em><</em>0·0001) between the number of times an ARG was observed in Oxfordshire and the number of times it was seen internationally, with ARGs that were observed six times in Oxfordshire always being found elsewhere. Finally, using the example of <em>bla</em><sub>TEM-1</sub>, we showed that uncatalogued variation, including synonymous variation, is associated with potentially important phenotypic differences; for example, two common, uncatalogued <em>bla</em><sub>TEM-1</sub> alleles with only synonymous mutations compared with the known reference were associated with reduced resistance to amoxicillin–clavulanic acid (adjusted odds ratio 0·58 [95% CI 0·35–0·95], p=0·031) and piperacillin–tazobactam (0·50 [95% CI 0·29–0·82], p=0·005).</div></div><div><h3>Interpretation</h3><div>We highlight substantial uncatalogued genetic variation with respect to known ARGs, although a relatively small proportion of these alleles are repeatedly observed in a large international dataset suggesting strong selection pressures. The current approach of using fuzzy matching for ARG detection, ignoring the unknown effects of uncatalogued variation, is unlikely to be acceptable for future clinical deployment. The association of synonymous mutations with potentially important phenotypic differences suggests that relying solely on amino acid-based gene detection to predict resistance is unlikely to be sufficient. Finally, the inability to explain all resistance using existing knowledge highlights the importance of new target gene discovery.</div></div><div><h3>Funding</h3><div>National Institute for Health and Care Research, Wellcome, and UK Medical Research Council.</div></div>","PeriodicalId":46633,"journal":{"name":"Lancet Microbe","volume":"5 11","pages":"Article 100913"},"PeriodicalIF":20.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Microbe","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666524724001526","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Antimicrobial resistance (AMR) in Escherichia coli is a global problem associated with substantial morbidity and mortality. AMR-associated genes are typically annotated based on similarity to variants in a curated reference database, with the implicit assumption that uncatalogued genetic variation within these is phenotypically unimportant. In this study, we evaluated the performance of the AMRFinder tool and, subsequently, the potential for discovering new AMR-associated gene families and characterising variation within existing ones to improve genotype-to-susceptibility phenotype predictions in E coli.

Methods

In this cross-sectional study of international genome sequence data, we assembled a global dataset of 9001 E coli sequences from five publicly available data collections predominantly deriving from human bloodstream infections from: Norway, Oxfordshire (UK), Thailand, the UK, and Sweden. 8555 of these sequences had linked antibiotic susceptibility data. Raw reads were assembled using Shovill and AMR genes (relevant to amoxicillin–clavulanic acid, ampicillin, ceftriaxone, ciprofloxacin, gentamicin, piperacillin–tazobactam, and trimethoprim) extracted using the National Center for Biotechnology Information AMRFinder tool (using both default and strict [100%] coverage and identity filters). We assessed the predictive value of the presence of these genes for predicting resistance or susceptibility against US Food and Drug Administration thresholds for major and very major errors. Mash was used to calculate the similarity between extracted genes using Jaccard distances. We empirically reclustered extracted gene sequences into AMR-associated gene families (≥70% match) and antibiotic-resistance genes (ARGs; 100% match) and categorised these according to their frequency in the dataset. Accumulation curves were simulated and correlations between gene frequency in the Oxfordshire and other datasets calculated using the Spearman coefficient. Firth regression was used to model the association between the presence of bla_TEM-1 variants and amoxicillin–clavulanic acid or piperacillin–tazobactam resistance, adjusted for the presence of other relevant ARGs.

Findings

The performance of the AMRFinder database for genotype-to-phenotype predictions using strict 100% identity and coverage thresholds did not meet US Food and Drug Administration thresholds for any of the seven antibiotics evaluated. Relaxing filters to default settings improved sensitivity with a specificity cost. For all antibiotics, most explainable resistance was associated with the presence of a small number of genes. There was a proportion of resistance that could not be explained by known ARGs; this ranged from 75·1% for amoxicillin–clavulanic acid to 3·4% for ciprofloxacin. Only 18 199 (51·5%) of the 35 343 ARGs detected had a 100% identity and coverage match in the AMRFinder database. After empirically reclassifying genes at 100% nucleotide sequence identity, we identified 1042 unique ARGs, of which 126 (12·1%) were present ten times or more, 313 (30·0%) were present between two and nine times, and 603 (57·9%) were present only once. Simulated accumulation curves revealed that discovery of new (100% match) ARGs present more than once in the dataset plateaued relatively quickly, whereas new singleton ARGs were discovered even after many thousands of isolates had been included. We identified a strong correlation (Spearman coefficient 0·76 [95% CI 0·73–0·80], p<0·0001) between the number of times an ARG was observed in Oxfordshire and the number of times it was seen internationally, with ARGs that were observed six times in Oxfordshire always being found elsewhere. Finally, using the example of bla_TEM-1, we showed that uncatalogued variation, including synonymous variation, is associated with potentially important phenotypic differences; for example, two common, uncatalogued bla_TEM-1 alleles with only synonymous mutations compared with the known reference were associated with reduced resistance to amoxicillin–clavulanic acid (adjusted odds ratio 0·58 [95% CI 0·35–0·95], p=0·031) and piperacillin–tazobactam (0·50 [95% CI 0·29–0·82], p=0·005).

Interpretation

We highlight substantial uncatalogued genetic variation with respect to known ARGs, although a relatively small proportion of these alleles are repeatedly observed in a large international dataset suggesting strong selection pressures. The current approach of using fuzzy matching for ARG detection, ignoring the unknown effects of uncatalogued variation, is unlikely to be acceptable for future clinical deployment. The association of synonymous mutations with potentially important phenotypic differences suggests that relying solely on amino acid-based gene detection to predict resistance is unlikely to be sufficient. Finally, the inability to explain all resistance using existing knowledge highlights the importance of new target gene discovery.

Funding

National Institute for Health and Care Research, Wellcome, and UK Medical Research Council.

查看原文本刊更多论文

探索大肠杆菌抗菌药耐药性基因家族中未编入目录的遗传变异：观察分析。

背景：大肠埃希菌的抗菌药耐药性（AMR）是一个全球性问题，与严重的发病率和死亡率有关。AMR相关基因的注释通常基于与参考数据库中的变异的相似性，隐含的假设是这些基因中未编入数据库的遗传变异在表型上并不重要。在这项研究中，我们评估了 AMRFinder 工具的性能，并随后评估了发现新的 AMR 相关基因家族和描述现有基因家族中变异的潜力，以改进大肠杆菌中基因型到易感性表型的预测：在这项国际基因组序列数据横断面研究中，我们从五个公开数据集中收集了 9001 个大肠杆菌序列，这些数据主要来自挪威、牛津郡（英国）、泰国、英国和瑞典的人类血液感染。其中 8555 个序列有相关的抗生素敏感性数据。使用 Shovill 对原始读数进行组装，并使用美国国家生物技术信息中心 AMRFinder 工具提取 AMR 基因（与阿莫西林-克拉维酸、氨苄西林、头孢曲松、环丙沙星、庆大霉素、哌拉西林-他唑巴坦和三甲氧苄青霉素相关）（使用默认和严格 [100%] 覆盖率和同一性过滤器）。我们根据美国食品和药物管理局的重大和极重大错误阈值，评估了这些基因的存在对预测耐药性或敏感性的预测价值。Mash 用于使用 Jaccard 距离计算提取基因之间的相似性。我们根据经验将提取的基因序列重新聚类为 AMR 相关基因家族（匹配度≥70%）和抗生素耐药基因（ARGs；匹配度 100%），并根据其在数据集中的频率进行分类。使用斯皮尔曼系数模拟了牛津郡和其他数据集中的基因频率累积曲线，并计算了两者之间的相关性。使用 Firth 回归法建立 blaTEM-1 变异与阿莫西林-克拉维酸或哌拉西林-他唑巴坦耐药性之间的关联模型，并对其他相关 ARGs 的存在进行调整：AMRFinder 数据库在使用严格的 100%同一性和覆盖率阈值进行基因型对表型预测时，对所评估的七种抗生素中的任何一种都没有达到美国食品药品管理局的阈值。将过滤器放宽到默认设置可提高灵敏度，但特异性要付出代价。对所有抗生素而言，大多数可解释的耐药性都与少量基因的存在有关。已知 ARGs 无法解释的耐药性占一定比例；阿莫西林-克拉维酸为 75-1%，环丙沙星为 3-4%。在检测到的 35 343 个 ARGs 中，只有 18 199 个（51-5%）在 AMRFinder 数据库中具有 100% 的同一性和覆盖率匹配。根据经验对核苷酸序列同一性达到 100% 的基因进行重新分类后，我们发现了 1042 个独特的 ARGs，其中 126 个（12-1%）出现了 10 次或更多次，313 个（30-0%）出现了 2 到 9 次，603 个（57-9%）只出现了一次。模拟积累曲线显示，在数据集中出现一次以上的新 ARGs（100% 匹配）的发现速度相对较快，而新的单个 ARGs 即使在纳入数千个分离物后仍能被发现。我们发现了一种强相关性（斯皮尔曼系数 0-76 [95% CI 0-73-0-80]，pTEM-1），表明未编目变异（包括同义变异）与潜在的重要表型差异有关；例如，两个常见的、未编入目录的 blaTEM-1 等位基因只有同义突变，与已知的参照基因相比，与对阿莫西林-克拉维酸（调整后的几率比 0-58 [95% CI 0-35-0-95]，p=0-031）和哌拉西林-他唑巴坦（0-50 [95% CI 0-29-0-82]，p=0-005）的耐药性降低有关。解释：我们强调了与已知 ARG 相关的大量未编入目录的遗传变异，尽管这些等位基因中相对较小的一部分在一个大型国际数据集中被反复观察到，这表明存在强大的选择压力。目前使用模糊匹配检测 ARG 的方法忽略了未编入目录的变异的未知影响，在未来的临床应用中不太可能被接受。同义突变与潜在的重要表型差异的关联表明，仅仅依靠基于氨基酸的基因检测来预测耐药性是不够的。最后，现有知识无法解释所有抗药性，这凸显了发现新靶基因的重要性：国家健康与护理研究所、惠康公司和英国医学研究委员会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Lancet Microbe Multiple-

CiteScore

27.20

自引率

0.80%

发文量

278

审稿时长

6 weeks

期刊介绍： The Lancet Microbe is a gold open access journal committed to publishing content relevant to clinical microbiologists worldwide, with a focus on studies that advance clinical understanding, challenge the status quo, and advocate change in health policy.