Identifying deleterious noncoding variation through gain and loss of CTCF binding activity

Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer
{"title":"Identifying deleterious noncoding variation through gain and loss of CTCF binding activity","authors":"Colby Tubbs, Mary Lauren Benton, Evonne McArthur, John A. Capra, Douglas M. Ruderfer","doi":"10.1101/2024.09.04.609712","DOIUrl":null,"url":null,"abstract":"Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.609712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCFs critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCFs binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10-16) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10-12). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10-14). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.
通过 CTCF 结合活性的增减识别有害的非编码变异
非编码单核苷酸变异是全基因组测序中最主要的一类遗传变异,也是表型变异的主要驱动因素。然而,对它们进行功能注释仍然具有挑战性。鉴于 CTCF 在基因调控中的关键作用以及调控数据集的广泛剖析,我们开发了一种假设驱动的 CTCF 结合位点功能注释方案。我们将 214 种生物背景下 1,063,879 个基因组位点上的 CTCFs 结合模式综合成一个总结性指标,我们称之为结合活性。我们发现,结合活性在保守核苷酸(Pearson R = 0.31,p < 2.2 x 10-16)和包含高质量 CTCF 结合图案的序列(Pearson R = 0.63,p = 2.9 x 10-12)中都有明显的富集。然后,我们将结合活性与精确度权重矩阵得分中的高置信度变化进行整合。通过将这一框架应用于 gnomAD 中的 1,253,330 个 SNVs,我们探索了针对 CTCF 结合破坏的选择特征。我们发现,在具有高体外活性的基因位点上,突变性调整单体比例(MAPS)指标与 CTCF 结合的丧失之间存在着强烈的正相关关系(Pearson R = 0.67,p = 1.5 x 10-14)。为了说明这些发现的背景,我们将 MAPS 应用于其他功能类变异,发现在 198,149 个 CTCF 结合丧失变异中,有一个子集与错义变异一样不常被观察到。这项工作将这数千个破坏 CTCF 结合的罕见非编码变异与进一步的功能研究联系起来,同时为非编码变异的可解释性注释提供了一个蓝图。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信