Anastasiia Zaremba, Małgorzata Marszałek-Zeńczak, Annasha Dutta, Anna Samelak-Czajka, Paulina Jackowiak
{"title":"A modular pipeline for evidence-integrated genome annotation across species: A case study on Schmidtea mediterranea","authors":"Anastasiia Zaremba, Małgorzata Marszałek-Zeńczak, Annasha Dutta, Anna Samelak-Czajka, Paulina Jackowiak","doi":"10.1016/j.ygeno.2025.111104","DOIUrl":null,"url":null,"abstract":"<div><div>Despite advancements in genome annotation tools, challenges persist for non-classical model organisms with limited genomic resources, such as <em>Schmidtea mediterranea</em>. To address these challenges, we developed a flexible and scalable genome annotation pipeline that integrates short-read (Illumina) and long-read (PacBio) sequencing technologies. The pipeline combines reference-based and <em>de novo</em> assembly methods, effectively handling genomic variability and alternative splicing events. To improve splice site detection accuracy, DeepSplice deep learning predictions are used. Functional annotation is conducted to filter out low-confidence transcripts and ensure biological relevance. Applying this pipeline to the asexual strain of <em>S. mediterranea</em> revealed thousands of previously undescribed putative genes and transcripts, and improved the existing gene models, highlighting its utility in annotating complex, underexplored genomes. The modularity and comprehensiveness of our pipeline ensure its adaptability for genome annotation across diverse species, making it a valuable tool for annotating genomes of non-model organisms and supporting broader genomic research. The source code and implementation details are available at <span><span>https://github.com/Norreanea/SmedAnno</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":12521,"journal":{"name":"Genomics","volume":"117 6","pages":"Article 111104"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S088875432500120X","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Despite advancements in genome annotation tools, challenges persist for non-classical model organisms with limited genomic resources, such as Schmidtea mediterranea. To address these challenges, we developed a flexible and scalable genome annotation pipeline that integrates short-read (Illumina) and long-read (PacBio) sequencing technologies. The pipeline combines reference-based and de novo assembly methods, effectively handling genomic variability and alternative splicing events. To improve splice site detection accuracy, DeepSplice deep learning predictions are used. Functional annotation is conducted to filter out low-confidence transcripts and ensure biological relevance. Applying this pipeline to the asexual strain of S. mediterranea revealed thousands of previously undescribed putative genes and transcripts, and improved the existing gene models, highlighting its utility in annotating complex, underexplored genomes. The modularity and comprehensiveness of our pipeline ensure its adaptability for genome annotation across diverse species, making it a valuable tool for annotating genomes of non-model organisms and supporting broader genomic research. The source code and implementation details are available at https://github.com/Norreanea/SmedAnno.
期刊介绍:
Genomics is a forum for describing the development of genome-scale technologies and their application to all areas of biological investigation.
As a journal that has evolved with the field that carries its name, Genomics focuses on the development and application of cutting-edge methods, addressing fundamental questions with potential interest to a wide audience. Our aim is to publish the highest quality research and to provide authors with rapid, fair and accurate review and publication of manuscripts falling within our scope.