Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance

IF 31.7 1区 生物学 Q1 GENETICS & HEREDITY
Elizabeth Dorans, Karthik Jagadeesh, Kushal Dey, Alkes L. Price
{"title":"Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance","authors":"Elizabeth Dorans, Karthik Jagadeesh, Kushal Dey, Alkes L. Price","doi":"10.1038/s41588-025-02220-3","DOIUrl":null,"url":null,"abstract":"<p>Methods that analyze single-cell paired RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) multiome data have shown promise in linking regulatory elements to genes. However, existing methods exhibit low concordance and do not capture the effects of genomic distance. We propose pgBoost, an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on expression quantitative trait locus (eQTL) data to assign a probabilistic score to each candidate single-nucleotide polymorphism–gene link. pgBoost attained higher enrichment than existing methods for evaluation sets derived from eQTL, activity-by-contact, CRISPR and genome-wide association study (GWAS) data. We further determined that restricting pgBoost to features from a focal cell type improved power to identify links relevant to that cell type. We highlight several examples in which pgBoost linked fine-mapped GWAS variants to experimentally validated or biologically plausible target genes that were not implicated by other methods. In conclusion, a non-linear combination of linking strategies improves power to identify target genes underlying GWAS associations.</p>","PeriodicalId":18985,"journal":{"name":"Nature genetics","volume":"9 1","pages":""},"PeriodicalIF":31.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41588-025-02220-3","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Methods that analyze single-cell paired RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) multiome data have shown promise in linking regulatory elements to genes. However, existing methods exhibit low concordance and do not capture the effects of genomic distance. We propose pgBoost, an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on expression quantitative trait locus (eQTL) data to assign a probabilistic score to each candidate single-nucleotide polymorphism–gene link. pgBoost attained higher enrichment than existing methods for evaluation sets derived from eQTL, activity-by-contact, CRISPR and genome-wide association study (GWAS) data. We further determined that restricting pgBoost to features from a focal cell type improved power to identify links relevant to that cell type. We highlight several examples in which pgBoost linked fine-mapped GWAS variants to experimentally validated or biologically plausible target genes that were not implicated by other methods. In conclusion, a non-linear combination of linking strategies improves power to identify target genes underlying GWAS associations.

Abstract Image

通过整合单细胞多组方法和基因组距离将调控变异体与靶基因联系起来
分析单细胞配对RNA测序(RNA-seq)和使用测序(ATAC-seq)多组数据分析转座酶可及染色质的方法已经显示出将调控元件与基因联系起来的希望。然而,现有的方法显示低一致性,并没有捕捉到基因组距离的影响。我们提出了pgBoost,这是一个综合建模框架,它根据表达数量性状位点(eQTL)数据训练现有连接策略(包括基因组距离)的非线性组合,为每个候选单核苷酸多态性-基因链接分配概率分数。与现有的基于eQTL、接触活性、CRISPR和全基因组关联研究(GWAS)数据的评估集相比,pgBoost获得了更高的富集度。我们进一步确定,将pgBoost限制在聚焦细胞类型的特征上,可以提高识别与该细胞类型相关的链接的能力。我们强调了几个例子,其中pgBoost将精细定位的GWAS变异与实验验证或生物学上合理的靶基因联系起来,而其他方法没有涉及这些基因。总之,连接策略的非线性组合提高了识别GWAS关联的靶基因的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature genetics
Nature genetics 生物-遗传学
CiteScore
43.00
自引率
2.60%
发文量
241
审稿时长
3 months
期刊介绍: Nature Genetics publishes the very highest quality research in genetics. It encompasses genetic and functional genomic studies on human and plant traits and on other model organisms. Current emphasis is on the genetic basis for common and complex diseases and on the functional mechanism, architecture and evolution of gene networks, studied by experimental perturbation. Integrative genetic topics comprise, but are not limited to: -Genes in the pathology of human disease -Molecular analysis of simple and complex genetic traits -Cancer genetics -Agricultural genomics -Developmental genetics -Regulatory variation in gene expression -Strategies and technologies for extracting function from genomic data -Pharmacological genomics -Genome evolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信