GrapHiC: An Integrative Graph Based Approach for Imputing Missing Hi-C Reads.

IF 3.4 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2025-03-01 DOI:10.1109/TCBB.2024.3477909

Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh

{"title":"GrapHiC: An Integrative Graph Based Approach for Imputing Missing Hi-C Reads.","authors":"Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh","doi":"10.1109/TCBB.2024.3477909","DOIUrl":null,"url":null,"abstract":"<p><p>Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"409-419"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034241/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3477909","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types.

查看原文本刊更多论文

GrapHiC：一种基于图的综合方法，用于估算缺失的 Hi-C 读数。

Hi-C 实验使研究人员能够研究和了解三维基因组的组织及其调控功能。遗憾的是，测序成本和技术限制严重制约了对许多细胞类型的高质量 Hi-C 数据的获取。现有的框架依赖于稀疏的 Hi-C 数据集或获取成本更低的 ChIP-seq 数据来预测高读数覆盖率的 Hi-C 接触图。然而，这些方法无法推广到稀疏或跨细胞类型的输入，因为它们没有考虑表观基因组特征的贡献或结构邻域对预测 Hi-C 读数的影响。我们提出的 GrapHiC 方法将 Hi-C 和 ChIP-seq 结合到图表示法中，可以更准确地嵌入结构和表观基因组特征。每个节点代表一个二进制基因组区域，我们使用观察到的 Hi-C 读数分配边缘权重。此外，我们还将 ChIP-seq 和相对位置信息嵌入节点属性，从而使我们的表征能够捕捉结构邻域和蛋白质及其修饰对预测 Hi-C 读数的贡献。我们的研究表明，在交叉细胞类型设置和稀疏 Hi-C 输入上，GrapHiC 的通用性优于目前最先进的技术。此外，即使没有 Hi-C 接触图，我们也能利用我们的框架来推算 Hi-C 读数，从而使许多细胞类型都能获得高质量的 Hi-C 数据。可用性：https://github.com/rsinghlab/GrapHiC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/ACM Transactions on Computational Biology and Bioinformatics 工程技术-计算机：跨学科应用

CiteScore

7.50

自引率

6.70%

发文量

479

审稿时长

3 months

期刊介绍： IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system