{"title":"利用决策树、Apriori算法、聚类分析和系统发育重建对四种偏肺病毒株的基因组比较","authors":"Sang-Ran Lim, Taeseon Yoon","doi":"10.1145/3309129.3309130","DOIUrl":null,"url":null,"abstract":"Human metapneumovirus has persistently been the leading causative agent of acute respiratory infections in young children and the elderly worldwide. The respiratory tract illness caused by HMPV yields fatal levels of morbidity and mortality rate in young children under five and the immunocompromised. To study the genetic structure of HMPV, this paper conducts a genomic analysis of the nine genes (N, P, M, F, M2-1, M2-2, SH, G, and L) of human metapneumovirus subtype A1, A2, B1, and B2. Through multiple sequence alignments, decision tree, Apriori algorithm, and phylogenetic reconstruction, this paper investigates the genome-wise discrepancy and the protein-wise discrepancy between different HMPV strains. The results of the experiment indicate that the four HMPV subtypes show high similarity while displaying distinct attributes. The role of glycoprotein (G) and small hydrophobic protein (SH) are found to display the most variance among the four subtypes. The Apriori algorithm shows that amino acid serine and lysine are the most frequent among the four subtypes. Under Apriori algorithm 19 window, it has been found that the four subtypes display some degree of similarity in terms of their frequencies of the amino acid lysine(K). On the other hand, two clades of HMPV seem to split in terms of their frequencies of the amino acid serine(S). Hence, the role of glycoprotein and small hydrophobic protein and the contribution of amino acids serine and lysine to the nine polypeptides are suggested as a future research.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Genomic Comparison of Four Metapneumovirus Strains Using Decision Tree, Apriori Algorithm, ClustalW, and Phylogenetic Reconstruction\",\"authors\":\"Sang-Ran Lim, Taeseon Yoon\",\"doi\":\"10.1145/3309129.3309130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human metapneumovirus has persistently been the leading causative agent of acute respiratory infections in young children and the elderly worldwide. The respiratory tract illness caused by HMPV yields fatal levels of morbidity and mortality rate in young children under five and the immunocompromised. To study the genetic structure of HMPV, this paper conducts a genomic analysis of the nine genes (N, P, M, F, M2-1, M2-2, SH, G, and L) of human metapneumovirus subtype A1, A2, B1, and B2. Through multiple sequence alignments, decision tree, Apriori algorithm, and phylogenetic reconstruction, this paper investigates the genome-wise discrepancy and the protein-wise discrepancy between different HMPV strains. The results of the experiment indicate that the four HMPV subtypes show high similarity while displaying distinct attributes. The role of glycoprotein (G) and small hydrophobic protein (SH) are found to display the most variance among the four subtypes. The Apriori algorithm shows that amino acid serine and lysine are the most frequent among the four subtypes. Under Apriori algorithm 19 window, it has been found that the four subtypes display some degree of similarity in terms of their frequencies of the amino acid lysine(K). On the other hand, two clades of HMPV seem to split in terms of their frequencies of the amino acid serine(S). Hence, the role of glycoprotein and small hydrophobic protein and the contribution of amino acids serine and lysine to the nine polypeptides are suggested as a future research.\",\"PeriodicalId\":326530,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Bioinformatics Research and Applications\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Bioinformatics Research and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3309129.3309130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3309129.3309130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Genomic Comparison of Four Metapneumovirus Strains Using Decision Tree, Apriori Algorithm, ClustalW, and Phylogenetic Reconstruction
Human metapneumovirus has persistently been the leading causative agent of acute respiratory infections in young children and the elderly worldwide. The respiratory tract illness caused by HMPV yields fatal levels of morbidity and mortality rate in young children under five and the immunocompromised. To study the genetic structure of HMPV, this paper conducts a genomic analysis of the nine genes (N, P, M, F, M2-1, M2-2, SH, G, and L) of human metapneumovirus subtype A1, A2, B1, and B2. Through multiple sequence alignments, decision tree, Apriori algorithm, and phylogenetic reconstruction, this paper investigates the genome-wise discrepancy and the protein-wise discrepancy between different HMPV strains. The results of the experiment indicate that the four HMPV subtypes show high similarity while displaying distinct attributes. The role of glycoprotein (G) and small hydrophobic protein (SH) are found to display the most variance among the four subtypes. The Apriori algorithm shows that amino acid serine and lysine are the most frequent among the four subtypes. Under Apriori algorithm 19 window, it has been found that the four subtypes display some degree of similarity in terms of their frequencies of the amino acid lysine(K). On the other hand, two clades of HMPV seem to split in terms of their frequencies of the amino acid serine(S). Hence, the role of glycoprotein and small hydrophobic protein and the contribution of amino acids serine and lysine to the nine polypeptides are suggested as a future research.