GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae043
Shuai Cao, Nunchanoke Sawettalake, Lisha Shen
{"title":"Gapless genome assembly and epigenetic profiles reveal gene regulation of whole-genome triplication in lettuce.","authors":"Shuai Cao, Nunchanoke Sawettalake, Lisha Shen","doi":"10.1093/gigascience/giae043","DOIUrl":"10.1093/gigascience/giae043","url":null,"abstract":"<p><strong>Background: </strong>Lettuce, an important member of the Asteraceae family, is a globally cultivated cash vegetable crop. With a highly complex genome (∼2.5 Gb; 2n = 18) rich in repeat sequences, current lettuce reference genomes exhibit thousands of gaps, impeding a comprehensive understanding of the lettuce genome.</p><p><strong>Findings: </strong>Here, we present a near-complete gapless reference genome for cutting lettuce with high transformability, using long-read PacBio HiFi and Nanopore sequencing data. In comparison to stem lettuce genome, we identify 127,681 structural variations (SVs, present in 0.41 Gb of sequence), reflecting the divergence of leafy and stem lettuce. Interestingly, these SVs are related to transposons and DNA methylation states. Furthermore, we identify 4,612 whole-genome triplication genes exhibiting high expression levels associated with low DNA methylation levels and high N6-methyladenosine RNA modifications. DNA methylation changes are also associated with activation of genes involved in callus formation.</p><p><strong>Conclusions: </strong>Our gapless lettuce genome assembly, an unprecedented achievement in the Asteraceae family, establishes a solid foundation for functional genomics, epigenomics, and crop breeding and sheds new light on understanding the complexity of gene regulation associated with the dynamics of DNA and RNA epigenetics in genome evolution.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.","authors":"Nathan J LeRoy, Oleksandr Khoroshevskyi, Aaron O'Brien, Rafał Stępień, Alip Arslan, Nathan C Sheffield","doi":"10.1093/gigascience/giae033","DOIUrl":"10.1093/gigascience/giae033","url":null,"abstract":"<p><strong>Background: </strong>As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves.</p><p><strong>Results: </strong>Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data.</p><p><strong>Availability: </strong>https://pephub.databio.org.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238423/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae050
Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel
{"title":"Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources.","authors":"Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel","doi":"10.1093/gigascience/giae050","DOIUrl":"10.1093/gigascience/giae050","url":null,"abstract":"<p><strong>Background: </strong>The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.</p><p><strong>Results: </strong>In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).</p><p><strong>Conclusions: </strong>Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304946/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae049
Christian Gaser, Robert Dahnke, Paul M Thompson, Florian Kurth, Eileen Luders, The Alzheimer's Disease Neuroimaging Initiative
{"title":"CAT: a computational anatomy toolbox for the analysis of structural MRI data.","authors":"Christian Gaser, Robert Dahnke, Paul M Thompson, Florian Kurth, Eileen Luders, The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1093/gigascience/giae049","DOIUrl":"10.1093/gigascience/giae049","url":null,"abstract":"<p><p>A large range of sophisticated brain image analysis tools have been developed by the neuroscience community, greatly advancing the field of human brain mapping. Here we introduce the Computational Anatomy Toolbox (CAT)-a powerful suite of tools for brain morphometric analyses with an intuitive graphical user interface but also usable as a shell script. CAT is suitable for beginners, casual users, experts, and developers alike, providing a comprehensive set of analysis options, workflows, and integrated pipelines. The available analysis streams-illustrated on an example dataset-allow for voxel-based, surface-based, and region-based morphometric analyses. Notably, CAT incorporates multiple quality control options and covers the entire analysis workflow, including the preprocessing of cross-sectional and longitudinal data, statistical analysis, and the visualization of results. The overarching aim of this article is to provide a complete description and evaluation of CAT while offering a citable standard for the neuroscience community.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299546/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giad109
Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck
{"title":"Evaluation of Swin Transformer and knowledge transfer for denoising of super-resolution structured illumination microscopy data.","authors":"Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck","doi":"10.1093/gigascience/giad109","DOIUrl":"10.1093/gigascience/giad109","url":null,"abstract":"<p><strong>Background: </strong>Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.</p><p><strong>Results: </strong>In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.</p><p><strong>Conclusion: </strong>The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139466408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giad107
Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang
{"title":"A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation.","authors":"Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang","doi":"10.1093/gigascience/giad107","DOIUrl":"10.1093/gigascience/giad107","url":null,"abstract":"<p><strong>Background: </strong>Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.</p><p><strong>Findings: </strong>Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.</p><p><strong>Conclusions: </strong>This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139086481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The first high-altitude autotetraploid haplotype-resolved genome assembled (Rhododendron nivale subsp. boreale) provides new insights into mountaintop adaptation.","authors":"Zhen-Yu Lyu, Xiong-Li Zhou, Si-Qi Wang, Gao-Ming Yang, Wen-Guang Sun, Jie-Yu Zhang, Rui Zhang, Shi-Kang Shen","doi":"10.1093/gigascience/giae052","DOIUrl":"10.1093/gigascience/giae052","url":null,"abstract":"<p><strong>Background: </strong>Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments.</p><p><strong>Findings: </strong>We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes.</p><p><strong>Conclusions: </strong>We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae059
Guowei Chen, Jingzhe Jiang, Yanni Sun
{"title":"RNAVirHost: a machine learning-based method for predicting hosts of RNA viruses through viral genomes.","authors":"Guowei Chen, Jingzhe Jiang, Yanni Sun","doi":"10.1093/gigascience/giae059","DOIUrl":"10.1093/gigascience/giae059","url":null,"abstract":"<p><strong>Background: </strong>The high-throughput sequencing technologies have revolutionized the identification of novel RNA viruses. Given that viruses are infectious agents, identifying hosts of these new viruses carries significant implications for public health and provides valuable insights into the dynamics of the microbiome. However, determining the hosts of these newly discovered viruses is not always straightforward, especially in the case of viruses detected in environmental samples. Even for host-associated samples, it is not always correct to assign the sample origin as the host of the identified viruses. The process of assigning hosts to RNA viruses remains challenging due to their high mutation rates and vast diversity.</p><p><strong>Results: </strong>In this study, we introduce RNAVirHost, a machine learning-based tool that predicts the hosts of RNA viruses solely based on viral genomes. RNAVirHost is a hierarchical classification framework that predicts hosts at different taxonomic levels. We demonstrate the superior accuracy of RNAVirHost in predicting hosts of RNA viruses through comprehensive comparisons with various state-of-the-art techniques. When applying to viruses from novel genera, RNAVirHost achieved the highest accuracy of 84.3%, outperforming the alignment-based strategy by 12.1%.</p><p><strong>Conclusions: </strong>The application of machine learning models has proven beneficial in predicting hosts of RNA viruses. By integrating genomic traits and sequence homologies, RNAVirHost provides a cost-effective and efficient strategy for host prediction. We believe that RNAVirHost can greatly assist in RNA virus analyses and contribute to pandemic surveillance.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae016
Bonson Wong, James M Ferguson, Jessica Y Do, Hasindu Gamaarachchi, Ira W Deveson
{"title":"Streamlining remote nanopore data access with slow5curl.","authors":"Bonson Wong, James M Ferguson, Jessica Y Do, Hasindu Gamaarachchi, Ira W Deveson","doi":"10.1093/gigascience/giae016","DOIUrl":"10.1093/gigascience/giae016","url":null,"abstract":"<p><strong>Background: </strong>As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis.</p><p><strong>Results: </strong>Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file. Slow5curl uses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelized data access requests to maximize download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate how slow5curl can be used to quickly fetch and reanalyze raw signal reads corresponding to a set of target genes from each individual in large cohort dataset (n = 91), minimizing the time, egress costs, and local storage requirements for their reanalysis.</p><p><strong>Conclusions: </strong>We provide slow5curl as a free, open-source package that will reduce frictions in data sharing for the nanopore community: https://github.com/BonsonW/slow5curl.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11010652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140848401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae028
Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang, Lu Zhang
{"title":"LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome.","authors":"Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang, Lu Zhang","doi":"10.1093/gigascience/giae028","DOIUrl":"10.1093/gigascience/giae028","url":null,"abstract":"<p><strong>Background: </strong>Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.</p><p><strong>Findings: </strong>To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.</p><p><strong>Conclusions: </strong>LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11170215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141310460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}