Samaneh Saadat, Z. Safikhani, K. Badie, M. Sadeghi
{"title":"用于从头转录组组装的短读序列聚类","authors":"Samaneh Saadat, Z. Safikhani, K. Badie, M. Sadeghi","doi":"10.22059/PBS.2014.50305","DOIUrl":null,"url":null,"abstract":"Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.","PeriodicalId":20726,"journal":{"name":"Progress in Biological Sciences","volume":"114 1","pages":"43-52"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering of Short Read Sequences for de novo Transcriptome Assembly\",\"authors\":\"Samaneh Saadat, Z. Safikhani, K. Badie, M. Sadeghi\",\"doi\":\"10.22059/PBS.2014.50305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.\",\"PeriodicalId\":20726,\"journal\":{\"name\":\"Progress in Biological Sciences\",\"volume\":\"114 1\",\"pages\":\"43-52\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Progress in Biological Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22059/PBS.2014.50305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Progress in Biological Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22059/PBS.2014.50305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
鉴于转录组分析在各种生物学研究中的重要性,并考虑到大量的全转录组测序数据,似乎有必要开发一种算法来组装转录组数据。在这项研究中,我们提出了一种在没有参考基因组的情况下转录组组装的算法。首先,利用不同k-mer长度的de Bruijn图生成连续序列。然后,折衷的混合序列被收集起来,以形成最终的序列。最后,对连续序列进行聚类并给出同型群。该算法能够生成长连续序列,并准确地将其聚类为同形组。为了评估我们的算法,我们将其应用于大鼠转录组的模拟RNA-seq数据集和loricaria gr. cataphracta转录组的实际RNA-seq实验。组装的contigs的正确性超过95%,我们的算法能够在超过80%的转录本长度上重建超过70%的转录本。这项研究表明,应用复杂的合并方法可以改善转录组组装。源代码可通过电子邮件联系相应的作者。
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.