共现指数在发现共表达代谢途径中的应用

IF 1.6 4区生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Physical biology Pub Date : 2024-08-29 DOI:10.1088/1478-3975/ad68b6

João Paulo Cassucci Dos Santos, Odemir Martinez Bruno

{"title":"共现指数在发现共表达代谢途径中的应用","authors":"João Paulo Cassucci Dos Santos, Odemir Martinez Bruno","doi":"10.1088/1478-3975/ad68b6","DOIUrl":null,"url":null,"abstract":"Analyzing transcription data requires intensive statistical analysis to obtain useful biological information and knowledge. A significant portion of this data is affected by random noise or even noise intrinsic to the modeling of the experiment. Without robust treatment, the data might not be explored thoroughly, and incorrect conclusions could be drawn. Examining the correlation between gene expression profiles is one way bioinformaticians extract information from transcriptomic experiments. However, the correlation measurements traditionally used have worrisome shortcomings that need to be addressed. This paper compares five already published and experimented-with correlation measurements to the newly developed coincidence index, a similarity measurement that combines Jaccard and interiority indexes and generalizes them to be applied to vectors containing real values. We used microarray and RNA-Seq data from the archaeonHalobacterium salinarumand the bacteriumEscherichia coli, respectively, to evaluate the capacity of each correlation/similarity measurement. The utilized method explores the co-expressed metabolic pathways by measuring the correlations between the expression levels of enzymes that share metabolites, represented in the form of a weighted graph. It then searches for local maxima in this graph using a simulated annealing algorithm. We demonstrate that the coincidence index extracts larger, more comprehensive, and more statistically significant pathways for microarray experiments. In RNA-Seq experiments, the results are more limited, but the coincidence index managed the largest percentage of significant components in the graph.","PeriodicalId":20207,"journal":{"name":"Physical biology","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of coincidence index in the discovery of co-expressed metabolic pathways.\",\"authors\":\"João Paulo Cassucci Dos Santos, Odemir Martinez Bruno\",\"doi\":\"10.1088/1478-3975/ad68b6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analyzing transcription data requires intensive statistical analysis to obtain useful biological information and knowledge. A significant portion of this data is affected by random noise or even noise intrinsic to the modeling of the experiment. Without robust treatment, the data might not be explored thoroughly, and incorrect conclusions could be drawn. Examining the correlation between gene expression profiles is one way bioinformaticians extract information from transcriptomic experiments. However, the correlation measurements traditionally used have worrisome shortcomings that need to be addressed. This paper compares five already published and experimented-with correlation measurements to the newly developed coincidence index, a similarity measurement that combines Jaccard and interiority indexes and generalizes them to be applied to vectors containing real values. We used microarray and RNA-Seq data from the archaeonHalobacterium salinarumand the bacteriumEscherichia coli, respectively, to evaluate the capacity of each correlation/similarity measurement. The utilized method explores the co-expressed metabolic pathways by measuring the correlations between the expression levels of enzymes that share metabolites, represented in the form of a weighted graph. It then searches for local maxima in this graph using a simulated annealing algorithm. We demonstrate that the coincidence index extracts larger, more comprehensive, and more statistically significant pathways for microarray experiments. In RNA-Seq experiments, the results are more limited, but the coincidence index managed the largest percentage of significant components in the graph.\",\"PeriodicalId\":20207,\"journal\":{\"name\":\"Physical biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1088/1478-3975/ad68b6\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1088/1478-3975/ad68b6","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

分析转录数据需要进行深入的统计分析，以获得有用的生物信息和知识。这些数据中有很大一部分受到随机噪声甚至是实验建模固有噪声的影响。如果不进行稳健的处理，可能无法对数据进行透彻的研究，从而得出错误的结论。研究基因表达谱之间的相关性是生物信息学家从转录组实验中提取信息的一种方法。然而，传统使用的相关性测量方法存在令人担忧的缺陷，需要加以解决。本文比较了五种已发表和实验过的相关性测量方法和新开发的巧合指数，巧合指数是一种相似性测量方法，它结合了雅卡德指数和内部性指数，并将它们推广应用于包含实值的向量。我们使用了分别来自古生物 Halobacterium salinarum 和大肠杆菌的微阵列和 RNA-Seq 数据来评估每种相关性/相似性测量方法的能力。所使用的方法通过测量共享代谢物的酶的表达水平之间的相关性来探索共表达的代谢途径，以加权图的形式表示。然后使用模拟退火算法在该图中寻找局部最大值。我们证明，巧合指数能为微阵列实验提取更大、更全面、更具统计意义的路径。在 RNA-Seq 实验中，结果较为有限，但重合指数在图中管理了最大比例的重要成分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application of coincidence index in the discovery of co-expressed metabolic pathways.

Analyzing transcription data requires intensive statistical analysis to obtain useful biological information and knowledge. A significant portion of this data is affected by random noise or even noise intrinsic to the modeling of the experiment. Without robust treatment, the data might not be explored thoroughly, and incorrect conclusions could be drawn. Examining the correlation between gene expression profiles is one way bioinformaticians extract information from transcriptomic experiments. However, the correlation measurements traditionally used have worrisome shortcomings that need to be addressed. This paper compares five already published and experimented-with correlation measurements to the newly developed coincidence index, a similarity measurement that combines Jaccard and interiority indexes and generalizes them to be applied to vectors containing real values. We used microarray and RNA-Seq data from the archaeonHalobacterium salinarumand the bacteriumEscherichia coli, respectively, to evaluate the capacity of each correlation/similarity measurement. The utilized method explores the co-expressed metabolic pathways by measuring the correlations between the expression levels of enzymes that share metabolites, represented in the form of a weighted graph. It then searches for local maxima in this graph using a simulated annealing algorithm. We demonstrate that the coincidence index extracts larger, more comprehensive, and more statistically significant pathways for microarray experiments. In RNA-Seq experiments, the results are more limited, but the coincidence index managed the largest percentage of significant components in the graph.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Physical biology 生物-生物物理

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： Physical Biology publishes articles in the broad interdisciplinary field bridging biology with the physical sciences and engineering. This journal focuses on research in which quantitative approaches – experimental, theoretical and modeling – lead to new insights into biological systems at all scales of space and time, and all levels of organizational complexity. Physical Biology accepts contributions from a wide range of biological sub-fields, including topics such as: molecular biophysics, including single molecule studies, protein-protein and protein-DNA interactions subcellular structures, organelle dynamics, membranes, protein assemblies, chromosome structure intracellular processes, e.g. cytoskeleton dynamics, cellular transport, cell division systems biology, e.g. signaling, gene regulation and metabolic networks cells and their microenvironment, e.g. cell mechanics and motility, chemotaxis, extracellular matrix, biofilms cell-material interactions, e.g. biointerfaces, electrical stimulation and sensing, endocytosis cell-cell interactions, cell aggregates, organoids, tissues and organs developmental dynamics, including pattern formation and morphogenesis physical and evolutionary aspects of disease, e.g. cancer progression, amyloid formation neuronal systems, including information processing by networks, memory and learning population dynamics, ecology, and evolution collective action and emergence of collective phenomena.