Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand
{"title":"Automated management of AWS instances for training.","authors":"Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand","doi":"10.46471/gigabyte.133","DOIUrl":"https://doi.org/10.46471/gigabyte.133","url":null,"abstract":"<p><p>Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte133"},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11382607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill <i>Platalea minor</i>.","authors":"","doi":"10.46471/gigabyte.130","DOIUrl":"10.46471/gigabyte.130","url":null,"abstract":"<p><p><i>Platalea minor</i>, or black-faced spoonbill (Threskiornithidae), is a wading bird confined to coastal areas in East Asia. Due to habitat destruction, it was classified as globally endangered by the International Union for Conservation of Nature. However, the lack of genomic resources for this species hinders the understanding of its biology and diversity, and the development of conservation measures. Here, we report the first chromosomal-level genome assembly of <i>P. minor</i> using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (1.24 Gb) contains 95.33% of the sequences anchored to 31 pseudomolecules. The genome assembly has high sequence continuity with scaffold length N50 = 53 Mb. We predicted 18,780 protein-coding genes and measured high BUSCO score completeness (97.3%). Finally, we revealed 6,155,417 bi-allelic single nucleotide polymorphisms, accounting for ∼5% of the genome. This resource offers new opportunities for studying the black-faced spoonbill and developing conservation measures for this species.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kinship analysis and pedigree reconstruction by RAD sequencing in cattle.","authors":"Yiming Xu, Wanqiu Wang, Jiefeng Huang, Minjie Xu, Binhu Wang, Yingsong Wu, Yongzhong Xie, Jianbo Jian","doi":"10.46471/gigabyte.131","DOIUrl":"10.46471/gigabyte.131","url":null,"abstract":"<p><p>Kinship and pedigree, used for estimating inbreeding, heritability, selection, and gene flow, are useful for breeding and animal conservation. However, as the size of crossbred populations increases, inaccurate generation and parentage assignment in livestock farms increase. Restriction-site-associated DNA sequencing is a cost-effective platform for single nucleotide polymorphism (SNP) discovery and genotyping. Here, we performed a kinship analysis and pedigree reconstruction for Angus and Xiangxi yellow cattle. A total of 975 cattle, including 923 offspring with 24 known sires and 28 known dams, were sampled and subjected to SNP discovery and genotyping. The identified SNP panel included 7,305 SNPs capturing the maximum difference between paternal and maternal genome information, allowing us to distinguish F1 from F2 generations with 90% accuracy. In conclusion, we provided a low-cost and efficient SNP panel for kinship analyses and the improvement of local genetic resources, which are valuable for breed improvement, local resource utilization, and conservation.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renato Santos, Víctor Moreno-Torres, Ilduara Pintos, Octavio Corral, Carmen de Mendoza, Vicente Soriano, Manuel Corpas
{"title":"Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients.","authors":"Renato Santos, Víctor Moreno-Torres, Ilduara Pintos, Octavio Corral, Carmen de Mendoza, Vicente Soriano, Manuel Corpas","doi":"10.46471/gigabyte.127","DOIUrl":"10.46471/gigabyte.127","url":null,"abstract":"<p><p>Despite the advances in genetic marker identification associated with severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≍0.97) across sequencing platforms, showcasing GLIMPSE1's ability to confidently impute variants with minor allele frequencies as low as 2% in individuals with Spanish ancestry. We carried out a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here can be leveraged for future genomic projects to gain vital insights into health challenges like COVID-19.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte127"},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11211761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randy Heiland, Daniel Bergman, Blair Lyons, Grant Waldow, Julie Cass, Heber Lima da Rocha, Marco Ruscone, Vincent Noël, Paul Macklin
{"title":"PhysiCell Studio: a graphical tool to make agent-based modeling more accessible.","authors":"Randy Heiland, Daniel Bergman, Blair Lyons, Grant Waldow, Julie Cass, Heber Lima da Rocha, Marco Ruscone, Vincent Noël, Paul Macklin","doi":"10.46471/gigabyte.128","DOIUrl":"10.46471/gigabyte.128","url":null,"abstract":"<p><p>Defining a multicellular model can be challenging. There may be hundreds of parameters that specify the attributes and behaviors of objects. In the best case, the model will be defined using some format specification - a markup language - that will provide easy model sharing (and a minimal step toward reproducibility). PhysiCell is an open-source, physics-based multicellular simulation framework with an active and growing user community. It uses XML to define a model and, traditionally, users needed to manually edit the XML to modify the model. PhysiCell Studio is a tool to make this task easier. It provides a GUI that allows editing the XML model definition, including the creation and deletion of fundamental objects: cell types and substrates in the microenvironment. It also lets users build their model by defining initial conditions and biological rules, run simulations, and view results interactively. PhysiCell Studio has evolved over multiple workshops and academic courses in recent years, which has led to many improvements. There is both a desktop and cloud version. Its design and development has benefited from an active undergraduate and graduate research program. Like PhysiCell, the Studio is open-source software and contributions from the community are encouraged.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte128"},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11211762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caroline A McCormick, Stuart Akeson, Sepideh Tavakoli, Dylan Bloch, Isabel N Klink, Miten Jain, Sara H Rouhanifard
{"title":"Multicellular, IVT-derived, unmodified human transcriptome for nanopore-direct RNA analysis.","authors":"Caroline A McCormick, Stuart Akeson, Sepideh Tavakoli, Dylan Bloch, Isabel N Klink, Miten Jain, Sara H Rouhanifard","doi":"10.46471/gigabyte.129","DOIUrl":"10.46471/gigabyte.129","url":null,"abstract":"<p><p>Nanopore direct RNA sequencing (DRS) enables measurements of RNA modifications. Modification-free transcripts are a practical and targeted control for DRS, providing a baseline measurement for canonical nucleotides within a matched and biologically-derived sequence context. However, these controls can be challenging to generate and carry nanopore-specific nuances that can impact analyses. We produced DRS datasets using modification-free transcripts from <i>in vitro</i> transcription of cDNA from six immortalized human cell lines. We characterized variation across cell lines and demonstrated how these may be interpreted. These data will serve as a versatile control and resource to the community for RNA modification analyses of human transcripts.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte129"},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11221353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Get Free Copy: a multi-repository search platform for biomedical publications.","authors":"Nodir Kosimkhujaev, Kuan-Lin Huang","doi":"10.46471/gigabyte.126","DOIUrl":"10.46471/gigabyte.126","url":null,"abstract":"<p><p>We introduce Get Free Copy (https://getfreecopy.com), a web-based platform designed to streamline the search for biomedical literature across major repositories like arXiv, bioRxiv, medRxiv, and PubMed Central (PMC). Addressing challenges posed by paywalls and fragmented databases, it offers a unified interface for efficient retrieval of free, legitimate copies of biomedical literature. The platform's implementation involves a Node.js backend and dynamic front-end display, enhancing accessibility and research efficiency. As an open-source project, Get Free Copy represents a significant contribution to the open-access movement, inviting global researcher collaboration for further development.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte126"},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11154096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141285565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome assembly of the rare and endangered Grantham's camellia, <i>Camellia granthamiana</i>.","authors":"","doi":"10.46471/gigabyte.124","DOIUrl":"10.46471/gigabyte.124","url":null,"abstract":"<p><p>Grantham's camellia (<i>Camellia granthamiana</i> Sealy) is a rare and endangered tea species discovered in Hong Kong in 1955 and endemic to southern China. Despite its high conservation value, the genomic resources of <i>C. granthamiana</i> are limited. Here, we present a chromosome-scale draft genome of the tetraploid <i>C. granthamiana</i> (2<i>n</i> = 4<i>x</i> = 60), combining PacBio long-read sequencing and Omni-C data. The assembled genome size is ∼2.4 Gb, with most sequences anchored to 15 pseudochromosomes resembling a monoploid genome. The genome has high contiguity, with a scaffold N50 of 139.7 Mb, and high completeness (97.8% BUSCO score). Our gene model prediction resulted in 68,032 protein-coding genes (BUSCO score of 90.9%). We annotated 1.65 Gb of repeat content (68.48% of the genome). Our Grantham's camellia genome assembly is a valuable resource for investigating Grantham's camellia's biology, ecology, and phylogenomic relationships with other <i>Camellia</i> species, and provides a foundation for further conservation measures.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte124"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11131091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141163069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Terenzini, Yannan Fan, Melissa Jean-Yi Liu, Laura J Falkenberg
{"title":"Jellyfish in Hong Kong: a citizen science dataset.","authors":"John Terenzini, Yannan Fan, Melissa Jean-Yi Liu, Laura J Falkenberg","doi":"10.46471/gigabyte.125","DOIUrl":"10.46471/gigabyte.125","url":null,"abstract":"<p><p>The Hong Kong Jellyfish Project is a citizen science initiative started in early 2021 to enhance our understanding of jellyfish in Hong Kong. Here, we present a dataset of jellyfish sightings collected by citizen scientists from 2021 through 2023 within local waters. Citizen scientists submitted photographs and other data (time, date, and location) using a website, iNaturalist project, and social media. Sightings were validated using references from the literature. A total of 1,020 usable observations are included in this dataset, showing the occurrence and distribution of jellyfish in Hong Kong in 2021-2023. This dataset is now publicly available and discoverable in the Global Biodiversity Information Facility database and is available for download. This data can be used to enhance our understanding of the biodiversity of local marine ecosystems.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte125"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11131163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141163074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neke Ibeh, Charles Y Feigin, Stephen R Frankenberg, Davis J McCarthy, Andrew J Pask, Irene Gallego Romero
{"title":"<i>De novo</i> transcriptome assembly and genome annotation of the fat-tailed dunnart (<i>Sminthopsis crassicaudata</i>).","authors":"Neke Ibeh, Charles Y Feigin, Stephen R Frankenberg, Davis J McCarthy, Andrew J Pask, Irene Gallego Romero","doi":"10.46471/gigabyte.118","DOIUrl":"10.46471/gigabyte.118","url":null,"abstract":"<p><p>Marsupials exhibit distinctive modes of reproduction and early development that set them apart from their eutherian counterparts and render them invaluable for comparative studies. However, marsupial genomic resources still lag far behind those of eutherian mammals. We present a series of novel genomic resources for the fat-tailed dunnart (<i>Sminthopsis crassicaudata</i>), a mouse-like marsupial that, due to its ease of husbandry and <i>ex-utero</i> development, is emerging as a laboratory model. We constructed a highly representative multi-tissue <i>de novo</i> transcriptome assembly of dunnart RNA-seq reads spanning 12 tissues. The transcriptome includes 2,093,982 assembled transcripts and has a mammalian transcriptome BUSCO completeness score of 93.3%, the highest amongst currently published marsupial transcriptomes. This global transcriptome, along with <i>ab initio</i> predictions, supported annotation of the existing dunnart genome, revealing 21,622 protein-coding genes. Altogether, these resources will enable wider use of the dunnart as a model marsupial and deepen our understanding of mammalian genome evolution.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte118"},"PeriodicalIF":0.0,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11091235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140923702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}