MinIsoClust

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics Pub Date : 2020-09-21 DOI:10.1145/3388440.3412424

S. Behera, J. Deogun, E. Moriyama

{"title":"MinIsoClust","authors":"S. Behera, J. Deogun, E. Moriyama","doi":"10.1145/3388440.3412424","DOIUrl":null,"url":null,"abstract":"With the advent of next-generation sequencing technologies, computational transcriptome assembly of RNA-Seq data has become a critical step in many biological and biomedical studies. The accuracy of these transcriptome assembly methods is hindered by the presence of alternatively spliced transcripts (isoforms). Identifying and quantifying isoforms is also essential in understanding complex biological functions, many of which are often associated with various diseases. However, clustering of isoform sequences using only sequence identities when quality reference genomes are not available is often difficult due to heterogeneous exon composition among isoforms. Clustering of a large number of transcript sequences also requires a scalable technique. In this study, we propose a minwise-hashing based method, MinIsoClust, for fast and accurate clustering of transcript sequences that can be used to identify groups of isoforms. We tested this new method using simulated datasets. The results demonstrated that MinIso-Clust was more accurate than CD-HIT-EST, isONclust, and MM-seqs2/Linclust. MinIsoClust also performed better than isONclust and MMseqs2/Linclust in terms of computational time and space efficiency. The source codes of MinIsoClust is freely available at https://github.com/srbehera/MinIsoClust.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MinIsoClust\",\"authors\":\"S. Behera, J. Deogun, E. Moriyama\",\"doi\":\"10.1145/3388440.3412424\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advent of next-generation sequencing technologies, computational transcriptome assembly of RNA-Seq data has become a critical step in many biological and biomedical studies. The accuracy of these transcriptome assembly methods is hindered by the presence of alternatively spliced transcripts (isoforms). Identifying and quantifying isoforms is also essential in understanding complex biological functions, many of which are often associated with various diseases. However, clustering of isoform sequences using only sequence identities when quality reference genomes are not available is often difficult due to heterogeneous exon composition among isoforms. Clustering of a large number of transcript sequences also requires a scalable technique. In this study, we propose a minwise-hashing based method, MinIsoClust, for fast and accurate clustering of transcript sequences that can be used to identify groups of isoforms. We tested this new method using simulated datasets. The results demonstrated that MinIso-Clust was more accurate than CD-HIT-EST, isONclust, and MM-seqs2/Linclust. MinIsoClust also performed better than isONclust and MMseqs2/Linclust in terms of computational time and space efficiency. The source codes of MinIsoClust is freely available at https://github.com/srbehera/MinIsoClust.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3412424\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3412424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MinIsoClust

With the advent of next-generation sequencing technologies, computational transcriptome assembly of RNA-Seq data has become a critical step in many biological and biomedical studies. The accuracy of these transcriptome assembly methods is hindered by the presence of alternatively spliced transcripts (isoforms). Identifying and quantifying isoforms is also essential in understanding complex biological functions, many of which are often associated with various diseases. However, clustering of isoform sequences using only sequence identities when quality reference genomes are not available is often difficult due to heterogeneous exon composition among isoforms. Clustering of a large number of transcript sequences also requires a scalable technique. In this study, we propose a minwise-hashing based method, MinIsoClust, for fast and accurate clustering of transcript sequences that can be used to identify groups of isoforms. We tested this new method using simulated datasets. The results demonstrated that MinIso-Clust was more accurate than CD-HIT-EST, isONclust, and MM-seqs2/Linclust. MinIsoClust also performed better than isONclust and MMseqs2/Linclust in terms of computational time and space efficiency. The source codes of MinIsoClust is freely available at https://github.com/srbehera/MinIsoClust.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

自引率

0.00%

发文量