Eamon Winden, Alejandro Vasquez-Echeverri, Susana Calle-Castañeda, Yumin Lian, Juan Pablo Hernandez Ortiz, David C Schwartz
{"title":"A database of restriction maps to expand the utility of bacterial artificial chromosomes.","authors":"Eamon Winden, Alejandro Vasquez-Echeverri, Susana Calle-Castañeda, Yumin Lian, Juan Pablo Hernandez Ortiz, David C Schwartz","doi":"10.46471/gigabyte.93","DOIUrl":"10.46471/gigabyte.93","url":null,"abstract":"<p><p>While Bacterial Artificial Chromosomes libraries were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database. Using information from the National Center for Biotechnology Information's cloneDB FTP site, we constructed a database containing the restriction maps for both uniquely placed and insert-sequenced BACs from 11 libraries covering the recognition sequences of the available restriction enzymes. Along with the database, we generated a set of Python functions to reconstruct the database and more easily access the information within. This data is valuable for researchers simply using BACs, as well as those working with larger sections of the genome in terms of synthetic genes, large-scale editing, and mapping.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte93"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41164956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aine Fairbrother-Browne, Sonia García-Ruiz, Regina Hertfelder Reynolds, Mina Ryten, Alan Hodgkinson
{"title":"ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R.","authors":"Aine Fairbrother-Browne, Sonia García-Ruiz, Regina Hertfelder Reynolds, Mina Ryten, Alan Hodgkinson","doi":"10.46471/gigabyte.91","DOIUrl":"10.46471/gigabyte.91","url":null,"abstract":"<p><p>We present ensemblQueryR, an R package for querying Ensembl linkage disequilibrium (LD) endpoints. This package is flexible, fast and user-friendly, and optimised for high-throughput querying. ensemblQueryR uses functions that are intuitive and amenable to custom code integration, familiar R object types as inputs and outputs as well as providing parallelisation functionality. For each Ensembl LD endpoint, ensemblQueryR provides two functions, permitting both single- and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate improved computational performance of ensemblQueryR over an exisiting tool in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through Docker and singularity images, making this tool widely accessible to the scientific community.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41153439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyu Wang, Lirong Liu, Wenbiao Zhu, Shiqing Wang, Minhui Shi, Shuhui Yang, Haorong Lu, Jun Cao
{"title":"Genome assembly and annotation of the Sharp-nosed Pit Viper <i>Deinagkistrodon acutus</i> based on next-generation sequencing data.","authors":"Xinyu Wang, Lirong Liu, Wenbiao Zhu, Shiqing Wang, Minhui Shi, Shuhui Yang, Haorong Lu, Jun Cao","doi":"10.46471/gigabyte.88","DOIUrl":"10.46471/gigabyte.88","url":null,"abstract":"<p><p>The study of the currently known >3,000 species of snakes can provide valuable insights into the evolution of their genomes. <i>Deinagkistrodon acutus</i>, also known as Sharp-nosed Pit Viper, one hundred-pacer viper or five-pacer viper, is a venomous snake with significant economic, medicinal and scientific importance. Widely distributed in southeastern China and South-East Asia, <i>D. acutus</i> has been primarily studied for its venom. Here, we employed next-generation sequencing to assemble and annotate a highly continuous genome of <i>D. acutus</i>. The genome size is 1.46 Gb; its scaffold N50 length is 6.21 Mb, the repeat content is 42.81%, and 24,402 functional genes were annotated. This study helps to further understand and utilize <i>D. acutus</i> and its venom at the genetic level.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte88"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10498098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10268545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucia Corte, Lathan Liou, Paul F O'Reilly, Judit García-González
{"title":"Trumpet plots: visualizing the relationship between allele frequency and effect size in genetic association studies.","authors":"Lucia Corte, Lathan Liou, Paul F O'Reilly, Judit García-González","doi":"10.46471/gigabyte.89","DOIUrl":"10.46471/gigabyte.89","url":null,"abstract":"<p><p>Recent advances in genome-wide association and sequencing studies have shown that the genetic architecture of complex traits and diseases involves a combination of rare and common genetic variants distributed throughout the genome. One way to better understand this architecture is to visualize genetic associations across a wide range of allele frequencies. However, there is currently no standardized or consistent graphical representation for effectively illustrating these results. Here we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum. The proposed plots have a distinctive trumpet shape: with the majority of variants having high frequency and small effects, and a small number of variants having lower frequency and larger effects. To demonstrate the utility of trumpet plots in illustrating the relationship between the number of variants, their frequency, and the magnitude of their effects in shaping the genetic architecture of complex traits and diseases, we generated trumpet plots for more than one hundred traits in the UK Biobank. To facilitate their broader use, we developed an R package, 'TrumpetPlots' (available at the Comprehensive R Archive Network) and R Shiny application, 'Shiny Trumpets' (available at https://juditgg.shinyapps.io/shinytrumpets/) that allows users to explore these results and submit their own data.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte89"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10498096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10268544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonia García-Ruiz, Regina Hertfelder Reynolds, Melissa Grant-Peters, Emil Karl Gustavsson, Aine Fairbrother-Browne, Zhongbo Chen, Jonathan William Brenton, Mina Ryten
{"title":"aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3.","authors":"Sonia García-Ruiz, Regina Hertfelder Reynolds, Melissa Grant-Peters, Emil Karl Gustavsson, Aine Fairbrother-Browne, Zhongbo Chen, Jonathan William Brenton, Mina Ryten","doi":"10.46471/gigabyte.87","DOIUrl":"10.46471/gigabyte.87","url":null,"abstract":"<p><p>Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists to verify end-to-end data integrity. Here, we present <i>aws-s3-integrity-check</i>, a user-friendly, lightweight, and reliable bash tool to verify the integrity of a dataset stored in an Amazon S3 bucket. Using this tool, we only needed ∼114 min to verify the integrity of 1,045 records ranging between 5 bytes and 10 gigabytes and occupying ∼935 gigabytes of the Amazon S3 cloud. Our <i>aws-s3-integrity-check</i> tool also provides file-by-file on-screen and log-file-based information about the status of each integrity check. To our knowledge, this tool is the only open-source one that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage quickly, reliably, and efficiently. The tool is freely available for download and use at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte87"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10165035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dataset and template for assessing the ecological status of marine sediments and waters, based on microbial taxa.","authors":"Angel Borja","doi":"10.46471/gigabyte.86","DOIUrl":"10.46471/gigabyte.86","url":null,"abstract":"<p><p>Microbes have often been overlooked as indicators of how the ecological status is affected by human pressures. Recently, the biotic index microgAMBI was proposed to assess the status of marine sediments and waters, and it has been tested under different pressures and biogeographical areas. This index is based on the assignation of microbial taxa to one of two ecological groups: sensitive or tolerant to pollution or disturbance. The resulting taxa list has grown significantly since its first publication. Given the growing use of microgAMBI, it is crucial to make it more FAIR: Findable, Accessible, Interoperable and Reusable. Hence, this work provides the calculation template, the updated taxa list (1,974 taxa currently), and instructions on how to access and use them for assessing marine microbial ecological status.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte86"},"PeriodicalIF":0.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10427998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10047074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pasquale Ciliberti, Astrid Roquas, Becky Desjardins, Bibiche Berkholst, Frank Loggen, Menno Hooft, Gideon Gijswijt, Dick de Graaff
{"title":"Digitizing the Culicidae collection of Naturalis Biodiversity Center, with a special focus on the former Bonne-Wepster subcollection.","authors":"Pasquale Ciliberti, Astrid Roquas, Becky Desjardins, Bibiche Berkholst, Frank Loggen, Menno Hooft, Gideon Gijswijt, Dick de Graaff","doi":"10.46471/gigabyte.85","DOIUrl":"10.46471/gigabyte.85","url":null,"abstract":"<p><p>Natural history collections contain a wealth of information on species diversity, distribution and ecology. However, due to historical and practical constraints, this valuable information is not always available to researchers. Our project aimed at unlocking data handwritten in notebooks owned by Johanna Bonne-Wepster, a Culicidae researcher. These handwritten notes refer to specimens labeled with a number only. The notebooks were scanned and entered into a Google spreadsheet. The specimens were provided with a unique identifier, labeled with the information from the notebooks and the data exported to the Global Biodiversity Information Facility. In addition, the type specimens were photographed. Besides Johanna Bonne-Wepster's collection, mosquitoes from the former Rijksmuseum van Natuurlijk Historie collection and the former Zoölogisch Museum Amsterdam Nederland collection were digitized. All specimens are now housed at the Naturalis Biodiversity Center museum in Leiden. This paper describes the efforts to mobilize this data and the problems we encountered.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte85"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10208256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sagar Patel, Zachary N Harris, Jason P Londo, Allison Miller, Anne Fennell
{"title":"Genome assembly of the hybrid grapevine <i>Vitis</i> 'Chambourcin'.","authors":"Sagar Patel, Zachary N Harris, Jason P Londo, Allison Miller, Anne Fennell","doi":"10.46471/gigabyte.84","DOIUrl":"10.46471/gigabyte.84","url":null,"abstract":"<p><p>'Chambourcin' is a French-American interspecific hybrid grape grown in the eastern and midwestern United States and used for making wine. Few genomic resources are available for hybrid grapevines like 'Chambourcin'. Here, we assembled the genome of 'Chambourcin' using PacBio HiFi long-read, Bionano optical map, and Illumina short-read sequencing technologies. We generated an assembly for 'Chambourcin' with 26 scaffolds, with an N50 length of 23.3 Mb and an estimated BUSCO completeness of 97.9%. We predicted 33,791 gene models and identified 16,056 common orthologs between 'Chambourcin', <i>V. vinifera</i> 'PN40024' 12X.v2, VCOST.v3, Shine Muscat and <i>V. riparia</i> Gloire. We found 1,606 plant transcription factors from 58 gene families. Finally, we identified 304,571 simple sequence repeats (up to six base pairs long). Our work provides the genome assembly, annotation and the protein and coding sequences of 'Chambourcin'. Our genome assembly is a valuable resource for genome comparisons, functional genomic analyses and genome-assisted breeding research.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte84"},"PeriodicalIF":0.0,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10318349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10161639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul Taconet, Barnabas Zogo, Dieudonné Diloma Soma, Ludovic P Ahoua Alou, Karine Mouline, Roch Kounbobr Dabiré, Alphonsine Amanan Koffi, Cédric Pennetier, Nicolas Moiroux
{"title":"<i>Anopheles</i> sampling collections in the health districts of Korhogo (Côte d'Ivoire) and Diébougou (Burkina Faso) between 2016 and 2018.","authors":"Paul Taconet, Barnabas Zogo, Dieudonné Diloma Soma, Ludovic P Ahoua Alou, Karine Mouline, Roch Kounbobr Dabiré, Alphonsine Amanan Koffi, Cédric Pennetier, Nicolas Moiroux","doi":"10.46471/gigabyte.83","DOIUrl":"10.46471/gigabyte.83","url":null,"abstract":"<p><p>Characterizing the entomological profile of malaria transmission at fine spatiotemporal scales is essential for developing and implementing effective vector control strategies. Here, we present a fine-grained dataset of <i>Anopheles</i> mosquitoes (Diptera: Culicidae) collected in 55 villages of the rural districts of Korhogo (Northern Côte d'Ivoire) and Diébougou (South-West Burkina Faso) between 2016 and 2018. In the framework of a randomized controlled trial, <i>Anopheles</i> mosquitoes were periodically collected by Human Landing Catches experts inside and outside households, and analyzed individually to identify the genus and, for a subsample, species, insecticide resistance genetic mutations, <i>Plasmodium falciparum</i> infection, and parity status. More than 3,000 collection sessions were carried out, achieving about 45,000 h of sampling efforts. Over 60,000 <i>Anopheles</i> were collected (mainly <i>A. gambiae</i> s.s., <i>A. coluzzii</i>, and <i>A. funestus</i>). The dataset is published as a Darwin Core archive in the Global Biodiversity Information Facility, comprising four files: events, occurrences, mosquito characterizations, and environmental data.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte83"},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10318348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9803417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The genome assembly and annotation of the many-banded krait, <i>Bungarus multicinctus</i>.","authors":"Boyang Liu, Liangyu Cui, Zhangwen Deng, Yue Ma, Diancheng Yang, Yanan Gong, Yanchun Xu, Tianming Lan, Shuhui Yang, Song Huang","doi":"10.46471/gigabyte.82","DOIUrl":"10.46471/gigabyte.82","url":null,"abstract":"<p><p>Snakes are a vital component of wildlife resources and are widely distributed across the globe. The many-banded krait <i>Bungarus multicinctus</i> is a highly venomous snake found across Southern Asia and central and southern China. Snakes are an ancient reptile group, and their genomes can provide important clues for understanding the evolutionary history of reptiles. Additionally, genomic resources play a crucial role in comprehending the evolution of all species. However, snake genomic resources are still scarce. Here, we present a highly contiguous genome of <i>B. multicinctus</i> with a size of 1.51 Gb. The genome contains a repeat content of 40.15%, with a total length exceeding 620 Mb. Additionally, we annotated a total of 24,869 functional genes. This research is of great significance for comprehending the evolution of <i>B. multicinctus</i> and provides genomic information on the genes involved in venom gland functions.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte82"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10315667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9802538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}