Use of 3D chaos game representation to quantify DNA sequence similarity with applications for hierarchical clustering

IF 1.9 4区 数学 Q2 BIOLOGY
Stephanie Young , Jérôme Gilles
{"title":"Use of 3D chaos game representation to quantify DNA sequence similarity with applications for hierarchical clustering","authors":"Stephanie Young ,&nbsp;Jérôme Gilles","doi":"10.1016/j.jtbi.2024.111972","DOIUrl":null,"url":null,"abstract":"<div><div>A 3D chaos game is shown to be a useful way for encoding DNA sequences. Since matching subsequences in DNA converge in space in 3D chaos game encoding, a DNA sequence’s 3D chaos game representation can be used to compare DNA sequences without prior alignment and without truncating or padding any of the sequences. Two proposed methods inspired by shape-similarity comparison techniques show that this form of encoding can perform as well as alignment-based techniques for building phylogenetic trees. The first method uses the volume overlap of intersecting spheres and the second uses shape signatures by summarizing the coordinates, oriented angles, and oriented distances of the 3D chaos game trajectory. The methods are tested using: (1) the first exon of the beta-globin gene for 11 species, (2) mitochondrial DNA from four groups of primates, and (3) a set of synthetic DNA sequences. Simulations show that the proposed methods produce distances that reflect the number of mutation events; additionally, on average, distances resulting from deletion mutations are comparable to those produced by substitution mutations.</div></div>","PeriodicalId":54763,"journal":{"name":"Journal of Theoretical Biology","volume":"596 ","pages":"Article 111972"},"PeriodicalIF":1.9000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Theoretical Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022519324002571","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

A 3D chaos game is shown to be a useful way for encoding DNA sequences. Since matching subsequences in DNA converge in space in 3D chaos game encoding, a DNA sequence’s 3D chaos game representation can be used to compare DNA sequences without prior alignment and without truncating or padding any of the sequences. Two proposed methods inspired by shape-similarity comparison techniques show that this form of encoding can perform as well as alignment-based techniques for building phylogenetic trees. The first method uses the volume overlap of intersecting spheres and the second uses shape signatures by summarizing the coordinates, oriented angles, and oriented distances of the 3D chaos game trajectory. The methods are tested using: (1) the first exon of the beta-globin gene for 11 species, (2) mitochondrial DNA from four groups of primates, and (3) a set of synthetic DNA sequences. Simulations show that the proposed methods produce distances that reflect the number of mutation events; additionally, on average, distances resulting from deletion mutations are comparable to those produced by substitution mutations.

Abstract Image

利用三维混沌博弈表示法量化 DNA 序列相似性,并将其应用于分层聚类。
三维混沌游戏是对 DNA 序列进行编码的有效方法。由于在三维混沌游戏编码中,DNA 中的匹配子序列在空间上趋同,DNA 序列的三维混沌游戏表示法可用于比较 DNA 序列,而无需事先进行比对,也无需截断或填充任何序列。受形状相似性比较技术启发而提出的两种方法表明,这种编码方式在构建系统发生树方面与基于比对的技术一样出色。第一种方法使用相交球体的体积重叠,第二种方法通过总结三维混沌游戏轨迹的坐标、定向角和定向距离来使用形状特征。对这些方法进行了测试:(1) 11 个物种的β-球蛋白基因的第一个外显子;(2) 四组灵长类动物的线粒体 DNA;(3) 一组合成 DNA 序列。模拟结果表明,所提出的方法产生的距离能够反映突变事件的数量;此外,平均而言,缺失突变产生的距离与置换突变产生的距离相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.20
自引率
5.00%
发文量
218
审稿时长
51 days
期刊介绍: The Journal of Theoretical Biology is the leading forum for theoretical perspectives that give insight into biological processes. It covers a very wide range of topics and is of interest to biologists in many areas of research, including: • Brain and Neuroscience • Cancer Growth and Treatment • Cell Biology • Developmental Biology • Ecology • Evolution • Immunology, • Infectious and non-infectious Diseases, • Mathematical, Computational, Biophysical and Statistical Modeling • Microbiology, Molecular Biology, and Biochemistry • Networks and Complex Systems • Physiology • Pharmacodynamics • Animal Behavior and Game Theory Acceptable papers are those that bear significant importance on the biology per se being presented, and not on the mathematical analysis. Papers that include some data or experimental material bearing on theory will be considered, including those that contain comparative study, statistical data analysis, mathematical proof, computer simulations, experiments, field observations, or even philosophical arguments, which are all methods to support or reject theoretical ideas. However, there should be a concerted effort to make papers intelligible to biologists in the chosen field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信