探索人类染色体接触网络中的三维群落不一致性

IF 2.6 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Dolores Bernenko, Sang Hoon Lee, L. Lizana
{"title":"探索人类染色体接触网络中的三维群落不一致性","authors":"Dolores Bernenko, Sang Hoon Lee, L. Lizana","doi":"10.1088/2632-072X/acef9d","DOIUrl":null,"url":null,"abstract":"Researchers have developed chromosome capture methods such as Hi-C to better understand DNA’s 3D folding in nuclei. The Hi-C method captures contact frequencies between DNA segment pairs across the genome. When analyzing Hi-C data sets, it is common to group these pairs using standard bioinformatics methods (e.g. PCA). Other approaches handle Hi-C data as weighted networks, where connected node pairs represent DNA segments in 3D proximity. In this representation, one can leverage community detection techniques developed in complex network theory to group nodes into mesoscale communities containing nodes with similar connection patterns. While there are several successful attempts to analyze Hi-C data in this way, it is common to report and study the most typical community structure. But in reality, there are often several valid candidates. Therefore, depending on algorithm design, different community detection methods focusing on slightly different connectivity features may have differing views on the ideal node groupings. In fact, even the same community detection method may yield different results if using a stochastic algorithm. This ambiguity is fundamental to community detection and shared by most complex networks whenever interactions span all scales in the network. This is known as community inconsistency. This paper explores this inconsistency of 3D communities in Hi-C data for all human chromosomes. We base our analysis on two inconsistency metrics, one local and one global, and quantify the network scales where the community separation is most variable. For example, we find that TADs are less reliable than A/B compartments and that nodes with highly variable node-community memberships are associated with open chromatin. Overall, our study provides a helpful framework for data-driven researchers and increases awareness of some inherent challenges when clustering Hi-C data into 3D communities.","PeriodicalId":53211,"journal":{"name":"Journal of Physics Complexity","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring 3D community inconsistency in human chromosome contact networks\",\"authors\":\"Dolores Bernenko, Sang Hoon Lee, L. Lizana\",\"doi\":\"10.1088/2632-072X/acef9d\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Researchers have developed chromosome capture methods such as Hi-C to better understand DNA’s 3D folding in nuclei. The Hi-C method captures contact frequencies between DNA segment pairs across the genome. When analyzing Hi-C data sets, it is common to group these pairs using standard bioinformatics methods (e.g. PCA). Other approaches handle Hi-C data as weighted networks, where connected node pairs represent DNA segments in 3D proximity. In this representation, one can leverage community detection techniques developed in complex network theory to group nodes into mesoscale communities containing nodes with similar connection patterns. While there are several successful attempts to analyze Hi-C data in this way, it is common to report and study the most typical community structure. But in reality, there are often several valid candidates. Therefore, depending on algorithm design, different community detection methods focusing on slightly different connectivity features may have differing views on the ideal node groupings. In fact, even the same community detection method may yield different results if using a stochastic algorithm. This ambiguity is fundamental to community detection and shared by most complex networks whenever interactions span all scales in the network. This is known as community inconsistency. This paper explores this inconsistency of 3D communities in Hi-C data for all human chromosomes. We base our analysis on two inconsistency metrics, one local and one global, and quantify the network scales where the community separation is most variable. For example, we find that TADs are less reliable than A/B compartments and that nodes with highly variable node-community memberships are associated with open chromatin. Overall, our study provides a helpful framework for data-driven researchers and increases awareness of some inherent challenges when clustering Hi-C data into 3D communities.\",\"PeriodicalId\":53211,\"journal\":{\"name\":\"Journal of Physics Complexity\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Physics Complexity\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/2632-072X/acef9d\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Physics Complexity","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-072X/acef9d","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

研究人员开发了Hi-C等染色体捕获方法,以更好地了解DNA在细胞核中的3D折叠。Hi-C方法捕获整个基因组中DNA片段对之间的接触频率。在分析Hi-C数据集时,通常使用标准生物信息学方法(如PCA)对这些对进行分组。其他方法将Hi-C数据处理为加权网络,其中连接的节点对表示3D邻近的DNA片段。在这种表示中,可以利用复杂网络理论中开发的社区检测技术,将节点分组为包含具有相似连接模式的节点的中尺度社区。虽然有几种成功的尝试以这种方式分析Hi-C数据,但报告和研究最典型的社区结构是很常见的。但在现实中,通常有几个有效的候选人。因此,根据算法设计,专注于略微不同的连接特征的不同社区检测方法可能对理想节点分组有不同的看法。事实上,如果使用随机算法,即使是相同的社区检测方法也可能产生不同的结果。这种模糊性是社区检测的基础,每当交互跨越网络中的所有规模时,大多数复杂网络都会共享这种模糊性。这就是所谓的社区不一致。本文探讨了所有人类染色体的Hi-C数据中3D群落的这种不一致性。我们的分析基于两个不一致性指标,一个是局部指标,另一个是全局指标,并量化社区分离变化最大的网络规模。例如,我们发现TAD不如A/B区室可靠,并且具有高度可变节点群落成员资格的节点与开放染色质相关。总的来说,我们的研究为数据驱动的研究人员提供了一个有用的框架,并提高了人们对将Hi-C数据聚类到3D社区时的一些固有挑战的认识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring 3D community inconsistency in human chromosome contact networks
Researchers have developed chromosome capture methods such as Hi-C to better understand DNA’s 3D folding in nuclei. The Hi-C method captures contact frequencies between DNA segment pairs across the genome. When analyzing Hi-C data sets, it is common to group these pairs using standard bioinformatics methods (e.g. PCA). Other approaches handle Hi-C data as weighted networks, where connected node pairs represent DNA segments in 3D proximity. In this representation, one can leverage community detection techniques developed in complex network theory to group nodes into mesoscale communities containing nodes with similar connection patterns. While there are several successful attempts to analyze Hi-C data in this way, it is common to report and study the most typical community structure. But in reality, there are often several valid candidates. Therefore, depending on algorithm design, different community detection methods focusing on slightly different connectivity features may have differing views on the ideal node groupings. In fact, even the same community detection method may yield different results if using a stochastic algorithm. This ambiguity is fundamental to community detection and shared by most complex networks whenever interactions span all scales in the network. This is known as community inconsistency. This paper explores this inconsistency of 3D communities in Hi-C data for all human chromosomes. We base our analysis on two inconsistency metrics, one local and one global, and quantify the network scales where the community separation is most variable. For example, we find that TADs are less reliable than A/B compartments and that nodes with highly variable node-community memberships are associated with open chromatin. Overall, our study provides a helpful framework for data-driven researchers and increases awareness of some inherent challenges when clustering Hi-C data into 3D communities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Physics Complexity
Journal of Physics Complexity Computer Science-Information Systems
CiteScore
4.30
自引率
11.10%
发文量
45
审稿时长
14 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信