{"title":"结合染色质相互作用和表观基因组数据与对比学习的染色质结构域注释。","authors":"Asato Yoshinaga, Osamu Maruyama","doi":"10.1093/bioinformatics/btaf464","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Chromatin domain annotation identifies functional genomic regions, such as active and inactive zones, based on epigenomic features like histone modifications, DNA methylation, and chromatin accessibility. While recent methods have utilized both chromatin interaction data (e.g. Hi-C) and epigenomic data, they often overlook the direct relationship between these data types.</p><p><strong>Results: </strong>In this study, we introduce Chromatin Domain Annotation using Contrastive Learning for Hi-C and Epigenomic Data (CDACHIE), a method for identifying chromatin domains from Hi-C and epigenomic data. Our approach leverages contrastive learning to generate aligned representative vectors for both data types at each genomic bin. The concatenated vectors are then clustered using K-means to classify distinct chromatin domain types. CDACHIE achieves superior performance in Variance Explained, evaluated across gene expression, replication timing, and ChIA-PET data. This highlights its robust ability to integrate semantic associations between Hi-C and epigenomic features within the embedding space.</p><p><strong>Availability and implementation: </strong>The source code is available at GitHub: https://github.com/maruyama-lab-design/CDACHIE. An archival snapshot of the code used in this study is available on Zenodo: https://doi.org/10.5281/zenodo.15751780.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422559/pdf/","citationCount":"0","resultStr":"{\"title\":\"CDACHIE: chromatin domain annotation by integrating chromatin interaction and epigenomic data with contrastive learning.\",\"authors\":\"Asato Yoshinaga, Osamu Maruyama\",\"doi\":\"10.1093/bioinformatics/btaf464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Chromatin domain annotation identifies functional genomic regions, such as active and inactive zones, based on epigenomic features like histone modifications, DNA methylation, and chromatin accessibility. While recent methods have utilized both chromatin interaction data (e.g. Hi-C) and epigenomic data, they often overlook the direct relationship between these data types.</p><p><strong>Results: </strong>In this study, we introduce Chromatin Domain Annotation using Contrastive Learning for Hi-C and Epigenomic Data (CDACHIE), a method for identifying chromatin domains from Hi-C and epigenomic data. Our approach leverages contrastive learning to generate aligned representative vectors for both data types at each genomic bin. The concatenated vectors are then clustered using K-means to classify distinct chromatin domain types. CDACHIE achieves superior performance in Variance Explained, evaluated across gene expression, replication timing, and ChIA-PET data. This highlights its robust ability to integrate semantic associations between Hi-C and epigenomic features within the embedding space.</p><p><strong>Availability and implementation: </strong>The source code is available at GitHub: https://github.com/maruyama-lab-design/CDACHIE. An archival snapshot of the code used in this study is available on Zenodo: https://doi.org/10.5281/zenodo.15751780.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422559/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf464\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
动机:染色质结构域注释根据组蛋白修饰、DNA甲基化和染色质可及性等表观基因组特征识别功能基因组区域,如活性区和非活性区。虽然最近的方法利用了染色质相互作用数据(例如,Hi-C)和表观基因组数据,但它们往往忽略了这些数据类型之间的直接关系。结果:在本研究中,我们引入了一种从Hi-C和表观基因组数据中识别染色质结构域的方法CDACHIE (Chromatin Domain Annotation using contrast Learning for high - c and epigenomics Data)。我们的方法利用对比学习为每个基因组箱的两种数据类型生成对齐的代表性向量。然后使用K-means对连接的向量进行聚类,以分类不同的染色质结构域类型。CDACHIE在方差解释、基因表达评估、复制时间和ChIA-PET数据方面表现优异。这突出了其在嵌入空间内整合Hi-C和表观基因组特征之间语义关联的强大能力。可用性和实现:源代码可在GitHub: https://github.com/maruyama-lab-design/\toolname。本研究中使用的代码的存档快照可在Zenodo上获得:https://doi.org/10.5281/zenodo.15751780。补充数据可在生物信息学网站获得。
CDACHIE: chromatin domain annotation by integrating chromatin interaction and epigenomic data with contrastive learning.
Motivation: Chromatin domain annotation identifies functional genomic regions, such as active and inactive zones, based on epigenomic features like histone modifications, DNA methylation, and chromatin accessibility. While recent methods have utilized both chromatin interaction data (e.g. Hi-C) and epigenomic data, they often overlook the direct relationship between these data types.
Results: In this study, we introduce Chromatin Domain Annotation using Contrastive Learning for Hi-C and Epigenomic Data (CDACHIE), a method for identifying chromatin domains from Hi-C and epigenomic data. Our approach leverages contrastive learning to generate aligned representative vectors for both data types at each genomic bin. The concatenated vectors are then clustered using K-means to classify distinct chromatin domain types. CDACHIE achieves superior performance in Variance Explained, evaluated across gene expression, replication timing, and ChIA-PET data. This highlights its robust ability to integrate semantic associations between Hi-C and epigenomic features within the embedding space.
Availability and implementation: The source code is available at GitHub: https://github.com/maruyama-lab-design/CDACHIE. An archival snapshot of the code used in this study is available on Zenodo: https://doi.org/10.5281/zenodo.15751780.