序生物序列的密度估计及其应用。

IF 2.2 3区 物理与天体物理 Q2 PHYSICS, FLUIDS & PLASMAS
Wei-Chia Chen, Juannan Zhou, David M McCandlish
{"title":"序生物序列的密度估计及其应用。","authors":"Wei-Chia Chen, Juannan Zhou, David M McCandlish","doi":"10.1103/PhysRevE.110.044408","DOIUrl":null,"url":null,"abstract":"<p><p>Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.</p>","PeriodicalId":48698,"journal":{"name":"Physical Review E","volume":"110 4-1","pages":"044408"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11605730/pdf/","citationCount":"0","resultStr":"{\"title\":\"Density estimation for ordinal biological sequences and its applications.\",\"authors\":\"Wei-Chia Chen, Juannan Zhou, David M McCandlish\",\"doi\":\"10.1103/PhysRevE.110.044408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.</p>\",\"PeriodicalId\":48698,\"journal\":{\"name\":\"Physical Review E\",\"volume\":\"110 4-1\",\"pages\":\"044408\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11605730/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical Review E\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1103/PhysRevE.110.044408\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, FLUIDS & PLASMAS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review E","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/PhysRevE.110.044408","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, FLUIDS & PLASMAS","Score":null,"Total":0}
引用次数: 0

摘要

生物序列不是随机出现的。相反,它们以特定的频率出现,反映了相关系统或现象的特性。因此,了解生物序列在序列空间中的分布情况,自然是理解其潜在机制的第一步。在此,我们提出了一种方法,用于推断生物序列样本的概率分布,即序列由允许自然排序的元素组成的情况。我们的方法基于贝叶斯场理论(一种基于物理学的机器学习方法),可视为传统最大熵估计的非参数扩展。举例来说,我们用它来分析癌症基因组图谱项目中胶质瘤的非整倍性数据。此外,我们还展示了利用所得到的概率分布可以进行的两项后续分析。其中之一是研究序列位点之间的关联。这提供了一种推断支配生物语法的方法。另一个是研究概率景观的全局几何,这使我们能够从进化的角度来看待这个问题。可见,这种方法使我们能够从序列样本中了解现实世界中的生物系统或现象是如何运作的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Density estimation for ordinal biological sequences and its applications.

Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Physical Review E
Physical Review E PHYSICS, FLUIDS & PLASMASPHYSICS, MATHEMAT-PHYSICS, MATHEMATICAL
CiteScore
4.50
自引率
16.70%
发文量
2110
期刊介绍: Physical Review E (PRE), broad and interdisciplinary in scope, focuses on collective phenomena of many-body systems, with statistical physics and nonlinear dynamics as the central themes of the journal. Physical Review E publishes recent developments in biological and soft matter physics including granular materials, colloids, complex fluids, liquid crystals, and polymers. The journal covers fluid dynamics and plasma physics and includes sections on computational and interdisciplinary physics, for example, complex networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信