{"title":"UnigeneFinder:一个自动管道,从转录组组装的基因调用没有参考基因组。","authors":"Bo Xue, Karine Prado, Seung Yon Rhee, Matt Stata","doi":"10.1002/pld3.70056","DOIUrl":null,"url":null,"abstract":"<p><p>For most species, transcriptome data are much more readily available than genome data. Without a reference genome, gene calling is cumbersome and inaccurate because of the high degree of redundancy in de novo transcriptome assemblies. To simplify and increase the accuracy of de novo transcriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared with existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high-quality reference genomes. These features, coupled with the pipeline's cross-platform implementation, focus on automation, and an accessible, user-friendly interface, make UnigeneFinder a useful tool for many downstream sequence-based analyses in nonmodel organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high-performance computing (HPC) systems and personal computers, further reducing barriers to use.</p>","PeriodicalId":20230,"journal":{"name":"Plant Direct","volume":"9 4","pages":"e70056"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012387/pdf/","citationCount":"0","resultStr":"{\"title\":\"UnigeneFinder: An Automated Pipeline for Gene Calling From Transcriptome Assemblies Without a Reference Genome.\",\"authors\":\"Bo Xue, Karine Prado, Seung Yon Rhee, Matt Stata\",\"doi\":\"10.1002/pld3.70056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>For most species, transcriptome data are much more readily available than genome data. Without a reference genome, gene calling is cumbersome and inaccurate because of the high degree of redundancy in de novo transcriptome assemblies. To simplify and increase the accuracy of de novo transcriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared with existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high-quality reference genomes. These features, coupled with the pipeline's cross-platform implementation, focus on automation, and an accessible, user-friendly interface, make UnigeneFinder a useful tool for many downstream sequence-based analyses in nonmodel organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high-performance computing (HPC) systems and personal computers, further reducing barriers to use.</p>\",\"PeriodicalId\":20230,\"journal\":{\"name\":\"Plant Direct\",\"volume\":\"9 4\",\"pages\":\"e70056\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012387/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Direct\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pld3.70056\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PLANT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Direct","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pld3.70056","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
UnigeneFinder: An Automated Pipeline for Gene Calling From Transcriptome Assemblies Without a Reference Genome.
For most species, transcriptome data are much more readily available than genome data. Without a reference genome, gene calling is cumbersome and inaccurate because of the high degree of redundancy in de novo transcriptome assemblies. To simplify and increase the accuracy of de novo transcriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared with existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high-quality reference genomes. These features, coupled with the pipeline's cross-platform implementation, focus on automation, and an accessible, user-friendly interface, make UnigeneFinder a useful tool for many downstream sequence-based analyses in nonmodel organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high-performance computing (HPC) systems and personal computers, further reducing barriers to use.
期刊介绍:
Plant Direct is a monthly, sound science journal for the plant sciences that gives prompt and equal consideration to papers reporting work dealing with a variety of subjects. Topics include but are not limited to genetics, biochemistry, development, cell biology, biotic stress, abiotic stress, genomics, phenomics, bioinformatics, physiology, molecular biology, and evolution. A collaborative journal launched by the American Society of Plant Biologists, the Society for Experimental Biology and Wiley, Plant Direct publishes papers submitted directly to the journal as well as those referred from a select group of the societies’ journals.