Genomic Comparison of Four Metapneumovirus Strains Using Decision Tree, Apriori Algorithm, ClustalW, and Phylogenetic Reconstruction

Sang-Ran Lim, Taeseon Yoon
{"title":"Genomic Comparison of Four Metapneumovirus Strains Using Decision Tree, Apriori Algorithm, ClustalW, and Phylogenetic Reconstruction","authors":"Sang-Ran Lim, Taeseon Yoon","doi":"10.1145/3309129.3309130","DOIUrl":null,"url":null,"abstract":"Human metapneumovirus has persistently been the leading causative agent of acute respiratory infections in young children and the elderly worldwide. The respiratory tract illness caused by HMPV yields fatal levels of morbidity and mortality rate in young children under five and the immunocompromised. To study the genetic structure of HMPV, this paper conducts a genomic analysis of the nine genes (N, P, M, F, M2-1, M2-2, SH, G, and L) of human metapneumovirus subtype A1, A2, B1, and B2. Through multiple sequence alignments, decision tree, Apriori algorithm, and phylogenetic reconstruction, this paper investigates the genome-wise discrepancy and the protein-wise discrepancy between different HMPV strains. The results of the experiment indicate that the four HMPV subtypes show high similarity while displaying distinct attributes. The role of glycoprotein (G) and small hydrophobic protein (SH) are found to display the most variance among the four subtypes. The Apriori algorithm shows that amino acid serine and lysine are the most frequent among the four subtypes. Under Apriori algorithm 19 window, it has been found that the four subtypes display some degree of similarity in terms of their frequencies of the amino acid lysine(K). On the other hand, two clades of HMPV seem to split in terms of their frequencies of the amino acid serine(S). Hence, the role of glycoprotein and small hydrophobic protein and the contribution of amino acids serine and lysine to the nine polypeptides are suggested as a future research.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3309129.3309130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Human metapneumovirus has persistently been the leading causative agent of acute respiratory infections in young children and the elderly worldwide. The respiratory tract illness caused by HMPV yields fatal levels of morbidity and mortality rate in young children under five and the immunocompromised. To study the genetic structure of HMPV, this paper conducts a genomic analysis of the nine genes (N, P, M, F, M2-1, M2-2, SH, G, and L) of human metapneumovirus subtype A1, A2, B1, and B2. Through multiple sequence alignments, decision tree, Apriori algorithm, and phylogenetic reconstruction, this paper investigates the genome-wise discrepancy and the protein-wise discrepancy between different HMPV strains. The results of the experiment indicate that the four HMPV subtypes show high similarity while displaying distinct attributes. The role of glycoprotein (G) and small hydrophobic protein (SH) are found to display the most variance among the four subtypes. The Apriori algorithm shows that amino acid serine and lysine are the most frequent among the four subtypes. Under Apriori algorithm 19 window, it has been found that the four subtypes display some degree of similarity in terms of their frequencies of the amino acid lysine(K). On the other hand, two clades of HMPV seem to split in terms of their frequencies of the amino acid serine(S). Hence, the role of glycoprotein and small hydrophobic protein and the contribution of amino acids serine and lysine to the nine polypeptides are suggested as a future research.
利用决策树、Apriori算法、聚类分析和系统发育重建对四种偏肺病毒株的基因组比较
人偏肺病毒一直是全球婴幼儿和老年人急性呼吸道感染的主要病原体。由人乳头状瘤病毒引起的呼吸道疾病在五岁以下幼儿和免疫功能低下者中造成致命的发病率和死亡率。为了研究HMPV的遗传结构,本文对人偏肺病毒A1、A2、B1、B2亚型的9个基因(N、P、M、F、M2-1、M2-2、SH、G、L)进行了基因组分析。通过多序列比对、决策树、Apriori算法和系统发育重建,研究了不同HMPV毒株之间的基因组差异和蛋白差异。实验结果表明,四种HMPV亚型具有较高的相似性,但具有不同的属性。糖蛋白(G)和小疏水蛋白(SH)的作用在四种亚型中表现出最大的差异。Apriori算法显示,在四种亚型中,氨基酸丝氨酸和赖氨酸是最常见的。在Apriori算法19窗口下,发现四种亚型在氨基酸赖氨酸(K)的频率方面表现出一定程度的相似性。另一方面,HMPV的两个分支似乎在氨基酸丝氨酸(S)的频率上分裂。因此,糖蛋白和小疏水蛋白的作用以及氨基酸丝氨酸和赖氨酸在这9种多肽中的作用被认为是未来的研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信