{"title":"Chromosomal-level genome assembly of golden birdwing <i>Troides aeacus</i> (Felder & Felder, 1860).","authors":"","doi":"10.46471/gigabyte.122","DOIUrl":"10.46471/gigabyte.122","url":null,"abstract":"<p><p>The golden birdwing <i>Troides aeacus</i> (Lepidoptera, Papilionidae), a significant species in Asia, faces habitat loss due to urbanization and human activities, necessitating its protection. However, the lack of genomic resources hinders our understanding of their biology and diversity, and impedes our conservation efforts based on genetic information or markers. Here, we present the first chromosomal-level genome assembly of <i>T. aeacus</i> using PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (351 Mb) contains 98.94% of the sequences anchored to 30 pseudo-molecules. The genome assembly has high sequence continuity with contig length N50 = 11.67 Mb and L50 = 14, and scaffold length N50 = 12.2 Mb and L50 = 13. A total of 24,946 protein-coding genes were predicted, with high BUSCO score completeness (98.8% and 94.7% of genome and proteome BUSCO, respectively. This genome offers a significant resource for understanding the swallowtail butterfly biology and carrying out its conservation.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte122"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11068028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140874142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chromosomal-level genome assembly of the long-spined sea urchin <i>Diadema setosum</i> (Leske, 1778).","authors":"","doi":"10.46471/gigabyte.121","DOIUrl":"10.46471/gigabyte.121","url":null,"abstract":"<p><p>The long-spined sea urchin <i>Diadema setosum</i> is an algal and coral feeder widely distributed in the Indo-Pacific that can cause severe bioerosion on the reef community. However, the lack of genomic information has hindered the study of its ecology and evolution. Here, we report the chromosomal-level genome (885.8 Mb) of the long-spined sea urchin <i>D. setosum</i> using a combination of PacBio long-read sequencing and Omni-C scaffolding technology. The assembled genome contains a scaffold N50 length of 38.3 Mb, 98.1% of complete BUSCO (Geno, metazoa_odb10) genes (the single copy score is 97.8% and the duplication score is 0.3%), and 98.6% of the sequences are anchored to 22 pseudo-molecules/chromosomes. A total of 27,478 gene models have were annotated, reaching a total of 28,414 transcripts, including 5,384 tRNA and 23,030 protein-coding genes. The high-quality genome of <i>D. setosum</i> presented here is a valuable resource for the ecological and evolutionary studies of this coral reef-associated sea urchin.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte121"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11066563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140860904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chromosome-level genome assembly of the common chiton, <i>Liolophura japonica</i> (Lischke, 1873).","authors":"","doi":"10.46471/gigabyte.123","DOIUrl":"10.46471/gigabyte.123","url":null,"abstract":"<p><p>Chitons (Polyplacophora) are marine molluscs that can be found worldwide from cold waters to the tropics, and play important ecological roles in the environment. However, only two chiton genomes have been sequenced to date. The chiton <i>Liolophura japonica</i> (Lischke, 1873) is one of the most abundant polyplacophorans found throughout East Asia. Our PacBio HiFi reads and Omni-C sequencing data resulted in a high-quality near chromosome-level genome assembly of ∼609 Mb with a scaffold N50 length of 37.34 Mb (96.1% BUSCO). A total of 28,233 genes were predicted, including 28,010 protein-coding ones. The repeat content (27.89%) was similar to that of other Chitonidae species and approximately three times lower than that of the Hanleyidae chiton genome. The genomic resources provided by this work will help to expand our understanding of the evolution of molluscs and the ecological adaptation of chitons.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte123"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11068029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome assembly of the edible jelly fungus <i>Dacryopinax spathularia (Dacrymycetaceae)</i>.","authors":"","doi":"10.46471/gigabyte.120","DOIUrl":"10.46471/gigabyte.120","url":null,"abstract":"<p><p>The edible jelly fungus <i>Dacryopinax spathularia</i> (<i>Dacrymycetaceae</i>) is wood-decaying and can be commonly found worldwide. It has found application in food additives, given its ability to synthesize long-chain glycolipids, among other uses. In this study, we present the genome assembly of <i>D. spathularia</i> using a combination of PacBio HiFi reads and Omni-C data. The genome size is 29.2 Mb. It has high sequence contiguity and completeness, with a scaffold N50 of 1.925 Mb and a 92.0% BUSCO score. A total of 11,510 protein-coding genes and 474.7 kb repeats (accounting for 1.62% of the genome) were predicted. The <i>D. spathularia</i> genome assembly generated in this study provides a valuable resource for understanding their ecology, such as their wood-decaying capability, their evolutionary relationships with other fungi, and their unique biology and applications in the food industry.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte120"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11066560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140874143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome assembly of the milky mangrove <i>Excoecaria agallocha</i>.","authors":"","doi":"10.46471/gigabyte.119","DOIUrl":"10.46471/gigabyte.119","url":null,"abstract":"<p><p>The milky mangrove <i>Excoecaria agallocha</i> is a latex-secreting mangrove that are distributed in tropical and subtropical regions. While its poisonous latex is regarded as a potential source of phytochemicals for biomedical applications, the genomic resources of <i>E. agallocha</i> remains limited. Here, we present a chromosomal level genome of <i>E. agallocha</i>, assembled from the combination of PacBio long-read sequencing and Omni-C data. The resulting assembly size is 1,332.45 Mb and has high contiguity and completeness with a scaffold N50 of 58.9 Mb and a BUSCO score of 98.4%, with 86.08% of sequences anchored to 18 pseudomolecules. 73,740 protein-coding genes were also predicted. The milky mangrove genome provides a useful resource for further understanding the biosynthesis of phytochemical compounds in <i>E. agallocha</i>.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte119"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11066562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140854565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging Biodiversity and Health: The Global Biodiversity Information Facility's initiative on open data on vectors of human diseases.","authors":"Paloma Shimabukuro, Quentin Groom, Florence Fouque, Lindsay Campbell, Theeraphap Chareonviriyaphap, Josiane Etang, Sylvie Manguin, Marianne Sinka, Dmitry Schigel, Kate Ingenloff","doi":"10.46471/gigabyte.117","DOIUrl":"10.46471/gigabyte.117","url":null,"abstract":"<p><p>There is an increased awareness of the importance of data publication, data sharing, and open science to support research, monitoring and control of vector-borne disease (VBD). Here we describe the efforts of the Global Biodiversity Information Facility (GBIF) as well as the World Health Special Programme on Research and Training in Diseases of Poverty (TDR) to promote publication of data related to vectors of diseases. In 2020, a GBIF task group of experts was formed to provide advice and support efforts aimed at enhancing the coverage and accessibility of data on vectors of human diseases within GBIF. Various strategies, such as organizing training courses and publishing data papers, were used to increase this content. This editorial introduces the outcome of a second call for data papers partnered by the TDR, GBIF and GigaScience Press in the journal <i>GigaByte</i>. Biodiversity and infectious diseases are linked in complex ways. These links can involve changes from the microorganism level to that of the habitat, and there are many ways in which these factors interact to affect human health. One way to tackle disease control and possibly elimination, is to provide stakeholders with access to a wide range of data shared under the FAIR principles, so it is possible to support early detection, analyses and evaluation, and to promote policy improvements and/or development.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte117"},"PeriodicalIF":0.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11027195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140860840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Snakemake workflows for long-read bacterial genome assembly and evaluation.","authors":"Peter Menzel","doi":"10.46471/gigabyte.116","DOIUrl":"10.46471/gigabyte.116","url":null,"abstract":"<p><p>With the advancement of long-read sequencing technologies and their increasing use for bacterial genomics, several methods for generating genome assemblies from error-prone long reads have been developed. These are complemented by various tools for assembly polishing using either long reads, short reads, or reference genomes. End users are therefore left with a plethora of possible combinations of programs for obtaining a final trusted assembly. Hence, there is also a need to measure the completeness and accuracy of such assemblies, for which, again, several evaluation methods implemented in various programs are available. In order to automatically run multiple genome assembly and evaluation programs at once, I developed two workflows for the workflow management system Snakemake, which provide end users with an easy-to-run solution for testing various genome assemblies from their sequencing data. Both workflows use the conda packaging system, so there is no need for manual installation of each program.</p><p><strong>Availability & implementation: </strong>The workflows are available as open source software under the MIT license at github.com/pmenzel/ont-assembly-snake and github.com/pmenzel/score-assemblies.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte116"},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11000499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140874304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Remy Gatins, Carlos F Arias, Carlos Sánchez, Giacomo Bernardi, Luis F De León
{"title":"Whole genome assembly and annotation of the King Angelfish (<i>Holacanthus passer</i>) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific.","authors":"Remy Gatins, Carlos F Arias, Carlos Sánchez, Giacomo Bernardi, Luis F De León","doi":"10.46471/gigabyte.115","DOIUrl":"10.46471/gigabyte.115","url":null,"abstract":"<p><p><i>Holacanthus</i> angelfishes are some of the most iconic marine fishes of the Tropical Eastern Pacific (TEP). However, very limited genomic resources currently exist for the genus. In this study we: (i) assembled and annotated the nuclear genome of the King Angelfish (<i>Holacanthus passer</i>), and (ii) examined the demographic history of <i>H. passer</i> in the TEP. We generated 43.8 Gb of ONT and 97.3 Gb Illumina reads representing 75× and 167× coverage, respectively. The final genome assembly size was 583 Mb with a contig N50 of 5.7 Mb, which captured 97.5% of the complete Actinoterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs). Repetitive elements accounted for 5.09% of the genome, and 33,889 protein-coding genes were predicted, of which 22,984 were functionally annotated. Our demographic analysis suggests that population expansions of <i>H. passer</i> occurred prior to the last glacial maximum (LGM) and were more likely shaped by events associated with the closure of the Isthmus of Panama. This result is surprising, given that most rapid population expansions in both freshwater and marine organisms have been reported to occur globally after the LGM. Overall, this annotated genome assembly provides a novel molecular resource to study the evolution of <i>Holacanthus</i> angelfishes, while facilitating research into local adaptation, speciation, and introgression in marine fishes.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte115"},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10973836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140320042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lipsa Priyadarsinee, Esther Jamir, Selvaraman Nagamani, Hridoy Jyoti Mahanta, Nandan Kumar, Lijo John, Himakshi Sarma, Asheesh Kumar, Anamika Singh Gaur, Rosaleen Sahoo, S Vaikundamani, N Arul Murugan, U Deva Priyakumar, G P S Raghava, Prasad V Bharatam, Ramakrishnan Parthasarathi, V Subramanian, G Madhavi Sastry, G Narahari Sastry
{"title":"Molecular Property Diagnostic Suite for COVID-19 (MPDS<sup>COVID-19</sup>): an open-source disease-specific drug discovery portal.","authors":"Lipsa Priyadarsinee, Esther Jamir, Selvaraman Nagamani, Hridoy Jyoti Mahanta, Nandan Kumar, Lijo John, Himakshi Sarma, Asheesh Kumar, Anamika Singh Gaur, Rosaleen Sahoo, S Vaikundamani, N Arul Murugan, U Deva Priyakumar, G P S Raghava, Prasad V Bharatam, Ramakrishnan Parthasarathi, V Subramanian, G Madhavi Sastry, G Narahari Sastry","doi":"10.46471/gigabyte.114","DOIUrl":"10.46471/gigabyte.114","url":null,"abstract":"<p><p>Molecular Property Diagnostic Suite (MPDS) was conceived and developed as an open-source disease-specific web portal based on Galaxy. MPDS<sup>COVID-19</sup> was developed for COVID-19 as a one-stop solution for drug discovery research. Galaxy platforms enable the creation of customized workflows connecting various modules in the web server. The architecture of MPDS<sup>COVID-19</sup> effectively employs Galaxy v22.04 features, which are ported on CentOS 7.8 and Python 3.7. MPDS<sup>COVID-19</sup> provides significant updates and the addition of several new tools updated after six years. Tools developed by our group in Perl/Python and open-source tools are collated and integrated into MPDS<sup>COVID-19</sup> using XML scripts. Our MPDS suite aims to facilitate transparent and open innovation. This approach significantly helps bring inclusiveness in the community while promoting free access and participation in software development.</p><p><strong>Availability & implementation: </strong>The MPDS<sup>COVID-19</sup> portal can be accessed at https://mpds.neist.res.in:8085/.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte114"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10958779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140208383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sami Hamdan, Shammi More, Leonard Sasse, Vera Komeyer, Kaustubh R Patil, Federico Raimondo
{"title":"Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models.","authors":"Sami Hamdan, Shammi More, Leonard Sasse, Vera Komeyer, Kaustubh R Patil, Federico Raimondo","doi":"10.46471/gigabyte.113","DOIUrl":"10.46471/gigabyte.113","url":null,"abstract":"<p><p>The fast-paced development of machine learning (ML) and its increasing adoption in research challenge researchers without extensive training in ML. In neuroscience, ML can help understand brain-behavior relationships, diagnose diseases and develop biomarkers using data from sources like magnetic resonance imaging and electroencephalography. Primarily, ML builds models to make accurate predictions on unseen data. Researchers evaluate models' performance and generalizability using techniques such as cross-validation (CV). However, choosing a CV scheme and evaluating an ML pipeline is challenging and, if done improperly, can lead to overestimated results and incorrect interpretations. Here, we created julearn, an open-source Python library allowing researchers to design and evaluate complex ML pipelines without encountering common pitfalls. We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects. Julearn simplifies the access to ML providing an easy-to-use environment. With its design, unique features, simple interface, and practical documentation, it poses as a useful Python-based library for research projects.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte113"},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10940896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140144689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}