{"title":"病毒群落研究中元转录组数据集交叉组装的性能分析","authors":"Yu.S. Bukin, A. N. Bondaryuk, T.V. Butina","doi":"10.17537/2023.18.418","DOIUrl":null,"url":null,"abstract":"We conducted a comparative analysis of individual and cross-assemblies of several metatranscriptomic data sets to study viral communities using several metatranscriptomes of endemic Baikal mollusks. We have shown that, compared to individual dataset assemblies, a Hidden Markov Model-based cross-assembly procedure increases the number of viral contigs (or scaffolds) per sample, the number of virotypes identified, and the average length of scaffolds per sample. The proportion of assembled viral reads from the total number of reads in samples is higher in cross-assembly. De novo cross-genomic assemblies combined with a virus identification algorithm using Hidden Markov Model present the data in a table with the number of reads from different samples for each scaffold. The table allows comparison of samples based on the representation of all viral scaffolds, including those not taxonomically identified, i.e. those that have no analogues in the NCBI RefSeq database. Thus, cross-genomic assemblies allow for comparative analyzes taking into account the latent diversity of viruses. We propose a pipeline for metatranscriptomic data analysis using de novo cross-genomic assembly to study viral diversity.","PeriodicalId":53525,"journal":{"name":"Mathematical Biology and Bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Analysis of Cross-Assembly of Metatranscriptomic Datasets in Viral Community Studies\",\"authors\":\"Yu.S. Bukin, A. N. Bondaryuk, T.V. Butina\",\"doi\":\"10.17537/2023.18.418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We conducted a comparative analysis of individual and cross-assemblies of several metatranscriptomic data sets to study viral communities using several metatranscriptomes of endemic Baikal mollusks. We have shown that, compared to individual dataset assemblies, a Hidden Markov Model-based cross-assembly procedure increases the number of viral contigs (or scaffolds) per sample, the number of virotypes identified, and the average length of scaffolds per sample. The proportion of assembled viral reads from the total number of reads in samples is higher in cross-assembly. De novo cross-genomic assemblies combined with a virus identification algorithm using Hidden Markov Model present the data in a table with the number of reads from different samples for each scaffold. The table allows comparison of samples based on the representation of all viral scaffolds, including those not taxonomically identified, i.e. those that have no analogues in the NCBI RefSeq database. Thus, cross-genomic assemblies allow for comparative analyzes taking into account the latent diversity of viruses. We propose a pipeline for metatranscriptomic data analysis using de novo cross-genomic assembly to study viral diversity.\",\"PeriodicalId\":53525,\"journal\":{\"name\":\"Mathematical Biology and Bioinformatics\",\"volume\":\"27 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematical Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17537/2023.18.418\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17537/2023.18.418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
Performance Analysis of Cross-Assembly of Metatranscriptomic Datasets in Viral Community Studies
We conducted a comparative analysis of individual and cross-assemblies of several metatranscriptomic data sets to study viral communities using several metatranscriptomes of endemic Baikal mollusks. We have shown that, compared to individual dataset assemblies, a Hidden Markov Model-based cross-assembly procedure increases the number of viral contigs (or scaffolds) per sample, the number of virotypes identified, and the average length of scaffolds per sample. The proportion of assembled viral reads from the total number of reads in samples is higher in cross-assembly. De novo cross-genomic assemblies combined with a virus identification algorithm using Hidden Markov Model present the data in a table with the number of reads from different samples for each scaffold. The table allows comparison of samples based on the representation of all viral scaffolds, including those not taxonomically identified, i.e. those that have no analogues in the NCBI RefSeq database. Thus, cross-genomic assemblies allow for comparative analyzes taking into account the latent diversity of viruses. We propose a pipeline for metatranscriptomic data analysis using de novo cross-genomic assembly to study viral diversity.