使用生成变压器模型增强单细胞和批量高碳数据。

IF 3.6 3区 生物学 Q1 BIOLOGY
Ruoying Gao, Thomas N Ferraro, Liang Chen, Shaoqiang Zhang, Yong Chen
{"title":"使用生成变压器模型增强单细胞和批量高碳数据。","authors":"Ruoying Gao, Thomas N Ferraro, Liang Chen, Shaoqiang Zhang, Yong Chen","doi":"10.3390/biology14030288","DOIUrl":null,"url":null,"abstract":"<p><p>The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.</p>","PeriodicalId":48624,"journal":{"name":"Biology-Basel","volume":"14 3","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11940666/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model.\",\"authors\":\"Ruoying Gao, Thomas N Ferraro, Liang Chen, Shaoqiang Zhang, Yong Chen\",\"doi\":\"10.3390/biology14030288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.</p>\",\"PeriodicalId\":48624,\"journal\":{\"name\":\"Biology-Basel\",\"volume\":\"14 3\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11940666/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology-Basel\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biology14030288\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biology14030288","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

真核细胞中染色质的三维组织在调节基因表达和维持细胞功能方面起着至关重要的作用。高通量染色体构象捕获(Hi-C)及其衍生技术已经发展到在群体和单细胞水平上绘制全基因组染色质相互作用。然而,在大量的Hi-C数据中,特别是在单细胞Hi-C (scHi-C)数据中,测序深度不足和高噪声水平导致低分辨率接触矩阵,从而限制了识别复杂染色体组织的多种下游计算分析。为了应对这些挑战,我们开发了一种基于变压器的深度学习模型HiCENT,用于估算和增强sch - c和Hi-C接触矩阵。在大规模批量Hi-C和sch - c数据集上的验证实验表明,与五种常用方法相比,HiCENT的增强效果更好。当应用于GM12878细胞系的真实Hi-C数据时,HiCENT在拓扑相关域和染色体环的尺度上有效地增强了3D结构特征。此外,当应用于五种人类细胞系的scHi-C数据时,该方法显著提高了聚类性能,优于五种广泛使用的方法。HiCENT对不同数据集的适应性及其提高染色质相互作用数据质量的能力将促进3D基因组研究、单细胞研究和其他大规模组学研究中的各种下游计算分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model.

The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biology-Basel
Biology-Basel Biological Science-Biological Science
CiteScore
5.70
自引率
4.80%
发文量
1618
审稿时长
11 weeks
期刊介绍: Biology (ISSN 2079-7737) is an international, peer-reviewed, quick-refereeing open access journal of Biological Science published by MDPI online. It publishes reviews, research papers and communications in all areas of biology and at the interface of related disciplines. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信