Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery

IF 10.4 1区 生物学 Q1 GENETICS & HEREDITY
Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware
{"title":"Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery","authors":"Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware","doi":"10.1186/s13073-024-01358-9","DOIUrl":null,"url":null,"abstract":"One of the major hurdles in clinical genetics is interpreting the clinical consequences associated with germline missense variants in humans. Recent significant advances have leveraged natural variation observed in large-scale human populations to uncover genes or genomic regions that show a depletion of natural variation, indicative of selection pressure. We refer to this as “genetic constraint”. Although existing genetic constraint metrics have been demonstrated to be successful in prioritising genes or genomic regions associated with diseases, their spatial resolution is limited in distinguishing pathogenic variants from benign variants within genes. We aim to identify missense variants that are significantly depleted in the general human population. Given the size of currently available human populations with exome or genome sequencing data, it is not possible to directly detect depletion of individual missense variants, since the average expected number of observations of a variant at most positions is less than one. We instead focus on protein domains, grouping homologous variants with similar functional impacts to examine the depletion of natural variations within these comparable sets. To accomplish this, we develop the Homologous Missense Constraint (HMC) score. We utilise the Genome Aggregation Database (gnomAD) 125 K exome sequencing data and evaluate genetic constraint at quasi amino-acid resolution by combining signals across protein homologues. We identify one million possible missense variants under strong negative selection within protein domains. Though our approach annotates only protein domains, it nonetheless allows us to assess 22% of the exome confidently. It precisely distinguishes pathogenic variants from benign variants for both early-onset and adult-onset disorders. It outperforms existing constraint metrics and pathogenicity meta-predictors in prioritising de novo mutations from probands with developmental disorders (DD). It is also methodologically independent of these, adding power to predict variant pathogenicity when used in combination. We demonstrate utility for gene discovery by identifying seven genes newly significantly associated with DD that could act through an altered-function mechanism. Grouping variants of comparable functional impacts is effective in evaluating their genetic constraint. HMC is a novel and accurate predictor of missense consequence for improved variant interpretation.","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"35 1","pages":""},"PeriodicalIF":10.4000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-024-01358-9","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

One of the major hurdles in clinical genetics is interpreting the clinical consequences associated with germline missense variants in humans. Recent significant advances have leveraged natural variation observed in large-scale human populations to uncover genes or genomic regions that show a depletion of natural variation, indicative of selection pressure. We refer to this as “genetic constraint”. Although existing genetic constraint metrics have been demonstrated to be successful in prioritising genes or genomic regions associated with diseases, their spatial resolution is limited in distinguishing pathogenic variants from benign variants within genes. We aim to identify missense variants that are significantly depleted in the general human population. Given the size of currently available human populations with exome or genome sequencing data, it is not possible to directly detect depletion of individual missense variants, since the average expected number of observations of a variant at most positions is less than one. We instead focus on protein domains, grouping homologous variants with similar functional impacts to examine the depletion of natural variations within these comparable sets. To accomplish this, we develop the Homologous Missense Constraint (HMC) score. We utilise the Genome Aggregation Database (gnomAD) 125 K exome sequencing data and evaluate genetic constraint at quasi amino-acid resolution by combining signals across protein homologues. We identify one million possible missense variants under strong negative selection within protein domains. Though our approach annotates only protein domains, it nonetheless allows us to assess 22% of the exome confidently. It precisely distinguishes pathogenic variants from benign variants for both early-onset and adult-onset disorders. It outperforms existing constraint metrics and pathogenicity meta-predictors in prioritising de novo mutations from probands with developmental disorders (DD). It is also methodologically independent of these, adding power to predict variant pathogenicity when used in combination. We demonstrate utility for gene discovery by identifying seven genes newly significantly associated with DD that could act through an altered-function mechanism. Grouping variants of comparable functional impacts is effective in evaluating their genetic constraint. HMC is a novel and accurate predictor of missense consequence for improved variant interpretation.
蛋白质结构域中单氨基酸分辨率的遗传约束提高了错义变体的优先级和基因发现能力
临床遗传学的主要障碍之一是解释与人类种系错义变异相关的临床后果。最近的重大进展是利用在大规模人类群体中观察到的自然变异,发现自然变异减少的基因或基因组区域,这表明存在选择压力。我们称之为 "遗传约束"。虽然现有的遗传约束指标已被证明能成功地优先识别与疾病相关的基因或基因组区域,但其空间分辨率有限,无法区分基因内的致病变异和良性变异。我们的目标是找出在普通人群中明显减少的错义变体。考虑到目前拥有外显子组或基因组测序数据的人类群体的规模,直接检测单个错义变体的耗竭是不可能的,因为在大多数位置观测到的变体的平均预期数量小于 1。我们转而关注蛋白质领域,将具有相似功能影响的同源变异分组,以检测这些可比组中自然变异的损耗情况。为此,我们开发了同源错义约束(HMC)得分。我们利用基因组聚合数据库(gnomAD)125 K 外显子组测序数据,通过结合蛋白质同源物的信号,以准氨基酸分辨率评估遗传约束。我们在蛋白质结构域内的强负选择下识别出一百万个可能的错义变体。虽然我们的方法只注释了蛋白质结构域,但仍能对 22% 的外显子组进行有把握的评估。它能精确区分早发和成年疾病的致病变异和良性变异。在对发育障碍(DD)患者的新发变异进行优先排序方面,它优于现有的约束指标和致病性元预测因子。它在方法上也独立于这些指标,在结合使用时能增加预测变异致病性的能力。我们发现了 7 个与发育障碍有显著关联的基因,这些基因可能通过功能改变机制发挥作用,从而证明了该方法在基因发现方面的实用性。将具有相似功能影响的变异基因分组可有效评估其遗传限制。HMC 是一种新颖而准确的错义后果预测方法,可用于改进变异解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genome Medicine
Genome Medicine GENETICS & HEREDITY-
CiteScore
20.80
自引率
0.80%
发文量
128
审稿时长
6-12 weeks
期刊介绍: Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信