{"title":"甲型流感病毒核蛋白基因的语言分析","authors":"A. Skourikhine, T. Burr","doi":"10.1109/BIBE.2000.889607","DOIUrl":null,"url":null,"abstract":"Applies a linguistic analysis method (N-grams) to classify nucleotide and amino acid sequences of the nucleoprotein (NP) gene of the influenza A virus isolated from three hosts and several geographic regions. We considered letter frequency (1-grams), letter-pairs' frequency (2-grams) and triplets' frequency (3-grams). Nearest-neighbor classifiers and decision-tree classifiers based on 1-, 2- and 3-grams were constructed for NP nucleotide and amino acid strains, and their classification efficiencies were compared with the groupings obtained using phylogenetic analysis. Our results show that disregarding positional information for NP can provide almost the same high level of classification accuracy as alternative, more complex classification techniques that use positional information.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Linguistic analysis of the nucleoprotein gene of influenza A virus\",\"authors\":\"A. Skourikhine, T. Burr\",\"doi\":\"10.1109/BIBE.2000.889607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applies a linguistic analysis method (N-grams) to classify nucleotide and amino acid sequences of the nucleoprotein (NP) gene of the influenza A virus isolated from three hosts and several geographic regions. We considered letter frequency (1-grams), letter-pairs' frequency (2-grams) and triplets' frequency (3-grams). Nearest-neighbor classifiers and decision-tree classifiers based on 1-, 2- and 3-grams were constructed for NP nucleotide and amino acid strains, and their classification efficiencies were compared with the groupings obtained using phylogenetic analysis. Our results show that disregarding positional information for NP can provide almost the same high level of classification accuracy as alternative, more complex classification techniques that use positional information.\",\"PeriodicalId\":196846,\"journal\":{\"name\":\"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2000.889607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2000.889607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Linguistic analysis of the nucleoprotein gene of influenza A virus
Applies a linguistic analysis method (N-grams) to classify nucleotide and amino acid sequences of the nucleoprotein (NP) gene of the influenza A virus isolated from three hosts and several geographic regions. We considered letter frequency (1-grams), letter-pairs' frequency (2-grams) and triplets' frequency (3-grams). Nearest-neighbor classifiers and decision-tree classifiers based on 1-, 2- and 3-grams were constructed for NP nucleotide and amino acid strains, and their classification efficiencies were compared with the groupings obtained using phylogenetic analysis. Our results show that disregarding positional information for NP can provide almost the same high level of classification accuracy as alternative, more complex classification techniques that use positional information.