{"title":"Chromosome-level genome assemblies of five <i>Sinocyclocheilus</i> species.","authors":"Chao Bian, Ruihan Li, Yuqian Ouyang, Junxing Yang, Xidong Mu, Qiong Shi","doi":"10.46471/gigabyte.155","DOIUrl":"10.46471/gigabyte.155","url":null,"abstract":"<p><p><i>Sinocyclocheilus</i>, a genus of tetraploid fishes endemic to Southwest China's karst regions, are classified as second-class nationally protected species due to their fragile habitat. Limited high-quality genomic resources have hampered studies on their phylogenetic relationships and the origin of their polyploidy. Here, we present a high-quality genome assembly of the most abundant <i>Sinocyclocheilus</i> species, the golden-line barbel (<i>Sinocyclocheilus grahami</i>), by integrating PacBio long-read and Hi-C sequencing. The resulting scaffold-level genome-assembly is 1.6 Gb long, with a scaffold N50 of up to 30.7 Mb. We annotated 42,806 protein-coding genes. Also, 93.1% of the assembled genome sequences (about 1.5 Gb) and 93.8% of the total predicted genes were successfully anchored onto 48 chromosomes. Furthermore, we obtained chromosome-level genome assemblies for four other <i>Sinocyclocheilus</i> species (<i>S. anophthalmus</i>, <i>S. maitianheensis</i>, <i>S. anshuiensis</i>, and <i>S. rhinocerous</i>) based on homologous comparisons. These genomic resources will enable in-depth investigations on cave adaptation, improvement of economic values, and conservation of diverse <i>Sinocyclocheilus</i> fishes.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte155"},"PeriodicalIF":0.0,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficiently constructing complete genomes with CycloneSEQ to fill gaps in bacterial draft assemblies.","authors":"Hewei Liang, Yuanqiang Zou, Mengmeng Wang, Tongyuan Hu, Haoyu Wang, Wenxin He, Yanmei Ju, Ruijin Guo, Junyi Chen, Fei Guo, Tao Zeng, Yuliang Dong, Yuning Zhang, Bo Wang, Chuanyu Liu, Xin Jin, Wenwei Zhang, Xun Xu, Liang Xiao","doi":"10.46471/gigabyte.154","DOIUrl":"https://doi.org/10.46471/gigabyte.154","url":null,"abstract":"<p><p>Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, which are cost-effective and accurate but often produce fragmented draft genomes. Here, we used CycloneSEQ for long-read sequencing of ATCC BAA-835, producing long-reads with an average length of 11.6 kbp and an average quality score of 14.4. Hybrid assembly with short-reads data resulted in an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method, validated across nine species, successfully assembled complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long-reads to fill gaps and accurately assembling multi-copy rRNA genes, unlike short-reads alone. Data subsampling showed that combining over 500 Mbp of short-read data with 100 Mbp of long-read data yields high-quality circular assemblies. CycloneSEQ long-reads improves the assembly of circular complete genomes from mixed microbial communities; however, its base quality needs improving. Integrating DNBSEQ short-reads improved accuracy, resulting in complete and accurate assemblies.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte154"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12051259/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144044131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trinity Conn, Jill Ashey, Ross Cunning, Hollie M Putnam
{"title":"Genome assembly and annotation of <i>Acropora pulchra</i> from Mo'orea French Polynesia.","authors":"Trinity Conn, Jill Ashey, Ross Cunning, Hollie M Putnam","doi":"10.46471/gigabyte.153","DOIUrl":"https://doi.org/10.46471/gigabyte.153","url":null,"abstract":"<p><p>Reef-building corals are integral ecosystem engineers of tropical reefs but face threats from climate change. Investigating genetic, epigenetic, and environmental factors influencing their adaptation is critical. Genomic resources are essential for understanding coral biology and guiding conservation efforts. However, genomes of the coral genus <i>Acropora</i> are limited to highly-studied species. Here, we present the assembly and annotation of the genome and DNA methylome of <i>Acropora pulchra</i> from Mo'orea, French Polynesia. Using long-read PacBio HiFi and Illumina RNASeq, we generated the most complete <i>Acropora</i> genome to date (BUSCO completeness of 96.7% metazoan genes). The assembly size is 518 Mbp, with 174 scaffolds, and a scaffold N50 of 17 Mbp. We predicted 40,518 protein-coding genes and 16.74% of the genome in repeats. DNA methylation in the CpG context is 14.6%. This assembly of the <i>A. pulchra</i> genome and DNA methylome will support studies of coastal corals in French Polynesia, aiding conservation and comparative studies of <i>Acropora</i> and cnidarians.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte153"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11985253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CompactTree: a lightweight header-only C++ library and Python wrapper for ultra-large phylogenetics.","authors":"Niema Moshiri","doi":"10.46471/gigabyte.152","DOIUrl":"10.46471/gigabyte.152","url":null,"abstract":"<p><p>The study of viral and bacterial species requires the ability to load and traverse ultra-large phylogenies with tens of millions of tips, but existing tree libraries struggle to scale to these sizes. We introduce CompactTree, a lightweight header-only C++ library with a user-friendly Python wrapper for traversing ultra-large trees that can be easily incorporated into other tools. We show that CompactTree is orders of magnitude faster and requires orders of magnitude less memory than existing tree packages. CompactTree is freely accessible as an open source project: https://github.com/niemasd/CompactTree.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte152"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portable-CELLxGENE: standalone executables of CELLxGENE for easy installation.","authors":"George T Hall","doi":"10.46471/gigabyte.151","DOIUrl":"10.46471/gigabyte.151","url":null,"abstract":"<p><p>Biologists who want to analyse their single-cell transcriptomics dataset must install and use specialist software via the command line. This is often impractical for non-bioinformaticians. Whilst the popular CELLxGENE software provides an intuitive graphical interface to facilitate analysis outside the command line, its server-side installation and execution remain complex. A version that is easier to install and run would allow non-bioinformaticians to take advantage of this valuable tool without needing to use the command line. This work introduces Portable-CELLxGENE, a standalone distribution of CELLxGENE that can be installed via a graphical interface. It contains an easy-to-use extension of the CELLxGENE-Gateway Python package to allow the analysis of multiple datasets. This tool enables non-bioinformaticians to carry out simple analyses independently.</p><p><strong>Availability and implementation: </strong>Versions of Portable-CELLxGENE for Windows and MacOS, along with source code, are available at https://george-hall-ucl.github.io/Portable-CELLxGENE-Docs. It is licensed under the GNU General Public License v3.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte151"},"PeriodicalIF":0.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894539/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eleanore J Ritter, Noé Cochetel, Andrea Minio, Peter Cousins, Dario Cantu, Chad Niederhuth
{"title":"The assembly and annotation of two teinturier grapevine varieties, Dakapo and Rubired.","authors":"Eleanore J Ritter, Noé Cochetel, Andrea Minio, Peter Cousins, Dario Cantu, Chad Niederhuth","doi":"10.46471/gigabyte.149","DOIUrl":"10.46471/gigabyte.149","url":null,"abstract":"<p><p>Teinturier grapevines, known for their pigmented flesh berries due to anthocyanin production, are valuable for enhancing the pigmentation of wine, for potential health benefits, and for investigating anthocyanin production in plants. Here, we assembled and annotated the Dakapo and Rubired genomes, two teinturier varieties. For Dakapo, we combined Nanopore sequencing, Illumina sequencing, and scaffolding to the existing grapevine assembly to generate a final assembly of 508.5 Mbp. Combining <i>de novo</i> annotation and lifting over annotations from the existing grapevine reference produced annotation 36,940 gene annotations for Dakapo. For Rubired, PacBio HiFi reads were assembled, scaffolded, and phased to generate a diploid assembly with two haplotypes 474.7-476.0 Mbp long. <i>De novo</i> annotation of the diploid Rubired genome yielded annotations for 56,681 genes. Both genomes are highly contiguous and complete. The Dakapo and Rubired genome assemblies provide genetic resources for investigations into berry flesh pigmentation and other traits of interest in grapevine.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte149"},"PeriodicalIF":0.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ma Carmel F Javier, Albert C Noblezada, Persie Mark Q Sienes, Robert S Guino-O, Nadia Palomar-Abesamis, Maria Celia D Malay, Carmelo S Del Castillo, Victor Marco Emmanuel N Ferriols
{"title":"Draft genome of the endangered visayan spotted deer (<i>Rusa alfredi)</i>, a Philippine endemic species.","authors":"Ma Carmel F Javier, Albert C Noblezada, Persie Mark Q Sienes, Robert S Guino-O, Nadia Palomar-Abesamis, Maria Celia D Malay, Carmelo S Del Castillo, Victor Marco Emmanuel N Ferriols","doi":"10.46471/gigabyte.150","DOIUrl":"10.46471/gigabyte.150","url":null,"abstract":"<p><p>The Visayan Spotted Deer (VSD), or <i>Rusa alfredi</i>, is an endangered and endemic species in the Philippines. Despite its status, genomic information on <i>R. alfredi</i>, and the genus <i>Rusa</i> in general, is missing. This study presents the first draft genome assembly of the VSD using the Illumina short-read sequencing technology. The resulting RusAlf_1.1 assembly has a 2.52 Gb total length, with a contig N50 of 46 Kb and scaffold N50 size of 75 Mb. The assembly has a BUSCO complete score of 95.5%, demonstrating the genome's completeness, and includes the annotation of 24,531 genes. Our phylogenetic analysis based on single-copy orthologs revealed a close evolutionary relationship between <i>R. alfredi</i> and the genus <i>Cervus</i>. RusAlf_1.1 represents a significant advancement in our understanding of the VSD. It opens opportunities for further research in population genetics and evolutionary biology, potentially contributing to more effective conservation and management strategies for this endangered species.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte150"},"PeriodicalIF":0.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SqueezeCall: nanopore basecalling using a Squeezeformer network.","authors":"Zhongxu Zhu","doi":"10.46471/gigabyte.148","DOIUrl":"10.46471/gigabyte.148","url":null,"abstract":"<p><p>Nanopore sequencing, a third-generation sequencing technique, enables direct RNA sequencing, real-time analysis, and long-read length. Nanopore sequencers measure electrical current changes as nucleotides pass through nanopores; a basecaller identifies base sequences according to the raw current measurements. However, accurate basecalling remains challenging due to molecular variations and sequencing noise. Here, we introduce SqueezeCall, a novel Squeezeformer-based model for accurate nanopore basecalling. SqueezeCall uses convolution layers to down-sample raw signals and model local dependencies. A Squeezeformer network captures the global context, and a connectionist temporal classification (CTC) decoder with beam search generates DNA sequences. Experimental results demonstrated SqueezeCall's ability to resist noise, improving basecalling accuracy. We trained SqueezeCall combining three types of loss, and found that all three loss types contribute to basecalling accuracy. Experiments across multiple species demonstrated the potential of a Squeezeformer-based model to improve basecalling accuracy and its superiority over recurrent neural network-based models and Transformer-based models.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte148"},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11851125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143506532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A practical DNA data storage using an expanded alphabet introducing 5-methylcytosine.","authors":"Deruilin Liu, Demin Xu, Liuxin Shi, Jiayuan Zhang, Kewei Bi, Bei Luo, Chen Liu, Yuxiang Li, Guangyi Fan, Wen Wang, Zhi Ping","doi":"10.46471/gigabyte.147","DOIUrl":"10.46471/gigabyte.147","url":null,"abstract":"<p><p>The DNA molecule is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, this strategy is challenging due to the difficulty in synthesizing non-natural DNA sequences and their complex structure. Here, we described a practical DNA data storage transcoding scheme named R+ based on an expanded molecular alphabet that introduces 5-methylcytosine (5mC). We demonstrated its experimental validation by encoding one representative file into several 1.3∼1.6 kbps <i>in vitro</i> DNA fragments for nanopore sequencing. Our results show an average data recovery rate of 98.97% and 86.91% with and without reference, respectively. Our work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.</p><p><strong>Availability and implementation: </strong>R+ is implemented in Python and the code is available under a MIT license at https://github.com/Incpink-Liu/DNA-storage-R_plus.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte147"},"PeriodicalIF":0.0,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11791762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling-Hong Hung, Thomas J Dahlstrom, Johnalbert Garnica, Emmanuel Munoz, Robert Schmitz, Ka Yee Yeung
{"title":"Biodepot Launcher: an app to install, manage and launch bioinformatics workflows.","authors":"Ling-Hong Hung, Thomas J Dahlstrom, Johnalbert Garnica, Emmanuel Munoz, Robert Schmitz, Ka Yee Yeung","doi":"10.46471/gigabyte.146","DOIUrl":"10.46471/gigabyte.146","url":null,"abstract":"<p><p>We present the Biodepot Launcher, a desktop application that facilitates installation, management and deployment of bioinformatics workflows using the Biodepot-workflow-builder (Bwb). With the new app, Bwb can be started by double-clicking on an icon, eliminating the need for typing cryptic start up commands into the terminal. This creates an end-to-end graphical and easy-to-use interface to manage and launch containerized workflows on the local computer or cloud instances. Biodepot Launcher is written in React and Javascript, and uses the node.js framework Neutralinojs and web browser routines to allow the application to execute on Linux, Windows and Mac desktop environments.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte146"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}