MMV method: a new approach to compare protein sequences under binary representation.

IF 2.4 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Jayanta Pal, Soumen Ghosh, Bansibadan Maji, Dilip Kumar Bhattacharya
{"title":"MMV method: a new approach to compare protein sequences under binary representation.","authors":"Jayanta Pal, Soumen Ghosh, Bansibadan Maji, Dilip Kumar Bhattacharya","doi":"10.1080/07391102.2024.2317982","DOIUrl":null,"url":null,"abstract":"<p><p>In the present work, a new form of descriptor using minimal moment vector (MMV) is introduced to compare protein sequences in the frequency domain under their component wise binary representations. From every sequence, 20 different binary component sequences are formed, each corresponding to 20 amino acids. Each such vector is now shifted from the time domain to the frequency domain by applying the Fast Fourier Transform (FFT). Next, the power spectrum calculated from the FFT values for each component sequence is so normalized that the sum of the components equals 1. The descriptor is defined as a 20-component vector composed of the 20 second-order minimal moments calculated from the normalized spectrum of the 20 component sequences. Once the descriptor is known, the distance matrix is created by applying the Euclidean Distance measure. The phylogenetic tree is generated by applying the unweighted pair group method with the arithmetic mean (UPGMA) algorithm using Molecular Evolutionary Genetics Analysis11 (MEGA11) software. In this work, the datasets used for similarity studies are 9 NADH dehydrogenase 5 (ND5), 12 Baculoviruses, 24 Transferrins (TF) proteins, and 50 Spike Protein of coronavirus. A qualitative measure using rationalized perception is used to compare the effectiveness of the proposed method. Quantitative measure based on symmetric distance (SD) is used to compare the phylogenetic trees of the present method with those obtained by other methods. It is observed that the phylogenetic trees generated by the proposed technique are at par with their known biological references, and they produce results better than those of the earlier methods.</p>","PeriodicalId":15272,"journal":{"name":"Journal of Biomolecular Structure & Dynamics","volume":" ","pages":"6563-6569"},"PeriodicalIF":2.4000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomolecular Structure & Dynamics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1080/07391102.2024.2317982","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/20 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

In the present work, a new form of descriptor using minimal moment vector (MMV) is introduced to compare protein sequences in the frequency domain under their component wise binary representations. From every sequence, 20 different binary component sequences are formed, each corresponding to 20 amino acids. Each such vector is now shifted from the time domain to the frequency domain by applying the Fast Fourier Transform (FFT). Next, the power spectrum calculated from the FFT values for each component sequence is so normalized that the sum of the components equals 1. The descriptor is defined as a 20-component vector composed of the 20 second-order minimal moments calculated from the normalized spectrum of the 20 component sequences. Once the descriptor is known, the distance matrix is created by applying the Euclidean Distance measure. The phylogenetic tree is generated by applying the unweighted pair group method with the arithmetic mean (UPGMA) algorithm using Molecular Evolutionary Genetics Analysis11 (MEGA11) software. In this work, the datasets used for similarity studies are 9 NADH dehydrogenase 5 (ND5), 12 Baculoviruses, 24 Transferrins (TF) proteins, and 50 Spike Protein of coronavirus. A qualitative measure using rationalized perception is used to compare the effectiveness of the proposed method. Quantitative measure based on symmetric distance (SD) is used to compare the phylogenetic trees of the present method with those obtained by other methods. It is observed that the phylogenetic trees generated by the proposed technique are at par with their known biological references, and they produce results better than those of the earlier methods.

MMV 方法:一种在二进制表示下比较蛋白质序列的新方法。
在本研究中,使用最小矩向量(MMV)引入了一种新的描述符形式,用于比较蛋白质序列在频域中的二进制分量表示。每个序列可形成 20 个不同的二进制成分序列,每个序列对应 20 个氨基酸。现在,通过快速傅立叶变换(FFT),将每个此类向量从时域转移到频域。接下来,根据每个分量序列的 FFT 值计算出的功率谱进行归一化处理,使分量之和等于 1。描述符被定义为由 20 个分量向量组成的 20 个分量向量,这 20 个分量向量由 20 个分量序列的归一化频谱计算出的 20 个二阶最小矩组成。一旦描述符已知,就可以应用欧氏距离测量法创建距离矩阵。利用分子进化遗传学分析11(MEGA11)软件,采用算术平均数非加权成对分组法(UPGMA)算法生成系统发生树。在这项工作中,用于相似性研究的数据集包括 9 种 NADH 脱氢酶 5(ND5)、12 种杆状病毒、24 种转铁蛋白(TF)和 50 种冠状病毒尖峰蛋白。使用合理化感知进行定性测量,以比较拟议方法的有效性。基于对称距离(SD)的定量指标用于比较本方法与其他方法获得的系统发生树。结果表明,拟议技术生成的系统发生树与已知的生物参照物相当,其结果优于早期的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Biomolecular Structure & Dynamics
Journal of Biomolecular Structure & Dynamics 生物-生化与分子生物学
CiteScore
8.90
自引率
9.10%
发文量
597
审稿时长
2 months
期刊介绍: The Journal of Biomolecular Structure and Dynamics welcomes manuscripts on biological structure, dynamics, interactions and expression. The Journal is one of the leading publications in high end computational science, atomic structural biology, bioinformatics, virtual drug design, genomics and biological networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信