GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae036
Riccardo Baroncelli, José F Cobo-Díaz, Tiziano Benocci, Mao Peng, Evy Battaglia, Sajeet Haridas, William Andreopoulos, Kurt LaButti, Jasmyn Pangilinan, Anna Lipzen, Maxim Koriabine, Diane Bauer, Gaetan Le Floch, Miia R Mäkelä, Elodie Drula, Bernard Henrissat, Igor V Grigoriev, Jo Anne Crouch, Ronald P de Vries, Serenella A Sukno, Michael R Thon
{"title":"Genome evolution and transcriptome plasticity is associated with adaptation to monocot and dicot plants in Colletotrichum fungi.","authors":"Riccardo Baroncelli, José F Cobo-Díaz, Tiziano Benocci, Mao Peng, Evy Battaglia, Sajeet Haridas, William Andreopoulos, Kurt LaButti, Jasmyn Pangilinan, Anna Lipzen, Maxim Koriabine, Diane Bauer, Gaetan Le Floch, Miia R Mäkelä, Elodie Drula, Bernard Henrissat, Igor V Grigoriev, Jo Anne Crouch, Ronald P de Vries, Serenella A Sukno, Michael R Thon","doi":"10.1093/gigascience/giae036","DOIUrl":"10.1093/gigascience/giae036","url":null,"abstract":"<p><strong>Background: </strong>Colletotrichum fungi infect a wide diversity of monocot and dicot hosts, causing diseases on almost all economically important plants worldwide. Colletotrichum is also a suitable model for studying gene family evolution on a fine scale to uncover events in the genome associated with biological changes.</p><p><strong>Results: </strong>Here we present the genome sequences of 30 Colletotrichum species covering the diversity within the genus. Evolutionary analyses revealed that the Colletotrichum ancestor diverged in the late Cretaceous in parallel with the diversification of flowering plants. We provide evidence of independent host jumps from dicots to monocots during the evolution of Colletotrichum, coinciding with a progressive shrinking of the plant cell wall degradative arsenal and expansions in lineage-specific gene families. Comparative transcriptomics of 4 species adapted to different hosts revealed similarity in gene content but high diversity in the modulation of their transcription profiles on different plant substrates. Combining genomics and transcriptomics, we identified a set of core genes such as specific transcription factors, putatively involved in plant cell wall degradation.</p><p><strong>Conclusions: </strong>These results indicate that the ancestral Colletotrichum were associated with dicot plants and certain branches progressively adapted to different monocot hosts, reshaping the gene content and its regulation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae047
Tao Feng, Shufang Wu, Hongwei Zhou, Zhencheng Fang
{"title":"MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model.","authors":"Tao Feng, Shufang Wu, Hongwei Zhou, Zhencheng Fang","doi":"10.1093/gigascience/giae047","DOIUrl":"10.1093/gigascience/giae047","url":null,"abstract":"<p><strong>Background: </strong>Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs.</p><p><strong>Results: </strong>We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D.</p><p><strong>Conclusions: </strong>To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae040
Kari Lavikka, Jaana Oikkonen, Yilin Li, Taru Muranen, Giulia Micoli, Giovanni Marchi, Alexandra Lahtinen, Kaisa Huhtinen, Rainer Lehtonen, Sakari Hietanen, Johanna Hynninen, Anni Virtanen, Sampsa Hautaniemi
{"title":"Deciphering cancer genomes with GenomeSpy: a grammar-based visualization toolkit.","authors":"Kari Lavikka, Jaana Oikkonen, Yilin Li, Taru Muranen, Giulia Micoli, Giovanni Marchi, Alexandra Lahtinen, Kaisa Huhtinen, Rainer Lehtonen, Sakari Hietanen, Johanna Hynninen, Anni Virtanen, Sampsa Hautaniemi","doi":"10.1093/gigascience/giae040","DOIUrl":"10.1093/gigascience/giae040","url":null,"abstract":"<p><strong>Background: </strong>Visualization is an indispensable facet of genomic data analysis. Despite the abundance of specialized visualization tools, there remains a distinct need for tailored solutions. However, their implementation typically requires extensive programming expertise from bioinformaticians and software developers, especially when building interactive applications. Toolkits based on visualization grammars offer a more accessible, declarative way to author new visualizations. Yet, current grammar-based solutions fall short in adequately supporting the interactive analysis of large datasets with extensive sample collections, a pivotal task often encountered in cancer research.</p><p><strong>Findings: </strong>We present GenomeSpy, a grammar-based toolkit for authoring tailored, interactive visualizations for genomic data analysis. By using combinatorial building blocks and a declarative language, users can implement new visualization designs easily and embed them in web pages or end-user-oriented applications. A distinctive element of GenomeSpy's architecture is its effective use of the graphics processing unit in all rendering, enabling a high frame rate and smoothly animated interactions, such as navigation within a genome. We demonstrate the utility of GenomeSpy by characterizing the genomic landscape of 753 ovarian cancer samples from patients in the DECIDER clinical trial. Our results expand the understanding of the genomic architecture in ovarian cancer, particularly the diversity of chromosomal instability.</p><p><strong>Conclusions: </strong>GenomeSpy is a visualization toolkit applicable to a wide range of tasks pertinent to genome analysis. It offers high flexibility and exceptional performance in interactive analysis. The toolkit is open source with an MIT license, implemented in JavaScript, and available at https://genomespy.app/.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early microbial intervention reshapes phenotypes of newborn Bos taurus through metabolic regulations.","authors":"Yizhao Shen, Yan Li, Tingting Wu, Quanbin Dong, Qiufeng Deng, Lu Liu, Yanfei Guo, Yufeng Cao, Qiufeng Li, Jing Shi, Huayiyang Zou, Yuwen Jiao, Luoyang Ding, Jianguo Li, Yanxia Gao, Shixian Hu, Yifeng Wang, Lianmin Chen","doi":"10.1093/gigascience/giad118","DOIUrl":"10.1093/gigascience/giad118","url":null,"abstract":"<p><strong>Background: </strong>The rumen of neonatal calves has limited functionality, and establishing intestinal microbiota may play a crucial role in their health and performance. Thus, we aim to explore the temporal colonization of the gut microbiome and the benefits of early microbial transplantation (MT) in newborn calves.</p><p><strong>Results: </strong>We followed 36 newborn calves for 2 months and found that the composition and ecological interactions of their gut microbiomes likely reached maturity 1 month after birth. Temporal changes in the gut microbiome of newborn calves are widely associated with changes in their physiological statuses, such as growth and fiber digestion. Importantly, we observed that MT reshapes the gut microbiome of newborns by altering the abundance and interaction of Bacteroides species, as well as amino acid pathways, such as arginine biosynthesis. Two-year follow-up of those calves further showed that MT improves their later milk production. Notably, MT improves fiber digestion and antioxidant capacity of newborns while reducing diarrhea. MT also contributes to significant changes in the metabolomic landscape, and with putative causal mediation analysis, we suggest that altered gut microbial composition in newborns may influence physiological status through microbial-derived metabolites.</p><p><strong>Conclusions: </strong>Our study provides a metagenomic and metabolomic atlas of the temporal development of the gut microbiome in newborn calves. MT can alter the gut microbiome of newborns, leading to improved physiological status and later milk production. The data may help develop strategies to manipulate the gut microbiota during early life, which may be relevant to the health and production of newborn calves.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139466404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giad116
Pedro G Nachtigall, Alan M Durham, Darin R Rokyta, Inácio L M Junqueira-de-Azevedo
{"title":"ToxCodAn-Genome: an automated pipeline for toxin-gene annotation in genome assembly of venomous lineages.","authors":"Pedro G Nachtigall, Alan M Durham, Darin R Rokyta, Inácio L M Junqueira-de-Azevedo","doi":"10.1093/gigascience/giad116","DOIUrl":"10.1093/gigascience/giad116","url":null,"abstract":"<p><strong>Background: </strong>The rapid development of sequencing technologies resulted in a wide expansion of genomics studies using venomous lineages. This facilitated research focusing on understanding the evolution of adaptive traits and the search for novel compounds that can be applied in agriculture and medicine. However, the toxin annotation of genomes is a laborious and time-consuming task, and no consensus pipeline is currently available. No computational tool currently exists to address the challenges specific to toxin annotation and to ensure the reproducibility of the process.</p><p><strong>Results: </strong>Here, we present ToxCodAn-Genome, the first software designed to perform automated toxin annotation in genomes of venomous lineages. This pipeline was designed to retrieve the full-length coding sequences of toxins and to allow the detection of novel truncated paralogs and pseudogenes. We tested ToxCodAn-Genome using 12 genomes of venomous lineages and achieved high performance on recovering their current toxin annotations. This tool can be easily customized to allow improvements in the final toxin annotation set and can be expanded to virtually any venomous lineage. ToxCodAn-Genome is fast, allowing it to run on any personal computer, but it can also be executed in multicore mode, taking advantage of large high-performance servers. In addition, we provide a guide to direct future research in the venomics field to ensure a confident toxin annotation in the genome being studied. As a case study, we sequenced and annotated the toxin repertoire of Bothrops alternatus, which may facilitate future evolutionary and biomedical studies using vipers as models.</p><p><strong>Conclusions: </strong>ToxCodAn-Genome is suitable to perform toxin annotation in the genome of venomous species and may help to improve the reproducibility of further studies. ToxCodAn-Genome and the guide are freely available at https://github.com/pedronachtigall/ToxCodAn-Genome.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10797961/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139502191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2024-01-02DOI: 10.1093/gigascience/giae073
Seth A Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A Crandall, Todd H Oakley
{"title":"Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD).","authors":"Seth A Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A Crandall, Todd H Oakley","doi":"10.1093/gigascience/giae073","DOIUrl":"https://doi.org/10.1093/gigascience/giae073","url":null,"abstract":"<p><strong>Background: </strong>Predicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax-the wavelength of maximum absorbance-which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype.</p><p><strong>Results: </strong>Here, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites.</p><p><strong>Conclusion: </strong>The ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism's ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11512451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DriverMP enables improved identification of cancer driver genes","authors":"Yangyang Liu, Jiyun Han, Tongxin Kong, Nannan Xiao, Qinglin Mei, Juntao Liu","doi":"10.1093/gigascience/giad106","DOIUrl":"https://doi.org/10.1093/gigascience/giad106","url":null,"abstract":"Background Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. Findings We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, protein‒protein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. Conclusions The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":9.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138684540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2023-03-20Epub Date: 2023-03-27DOI: 10.1093/gigascience/giad019
Andrzej Oleksa, Eliza Căuia, Adrian Siceanu, Zlatko Puškadija, Marin Kovačić, M Alice Pinto, Pedro João Rodrigues, Fani Hatjina, Leonidas Charistos, Maria Bouga, Janez Prešern, İrfan Kandemir, Slađan Rašić, Szilvia Kusza, Adam Tofilski
{"title":"Honey bee (Apis mellifera) wing images: a tool for identification and conservation.","authors":"Andrzej Oleksa, Eliza Căuia, Adrian Siceanu, Zlatko Puškadija, Marin Kovačić, M Alice Pinto, Pedro João Rodrigues, Fani Hatjina, Leonidas Charistos, Maria Bouga, Janez Prešern, İrfan Kandemir, Slađan Rašić, Szilvia Kusza, Adam Tofilski","doi":"10.1093/gigascience/giad019","DOIUrl":"10.1093/gigascience/giad019","url":null,"abstract":"<p><strong>Background: </strong>The honey bee (Apis mellifera) is an ecologically and economically important species that provides pollination services to natural and agricultural systems. The biodiversity of the honey bee in parts of its native range is endangered by migratory beekeeping and commercial breeding. In consequence, some honey bee populations that are well adapted to the local environment are threatened with extinction. A crucial step for the protection of honey bee biodiversity is reliable differentiation between native and nonnative bees. One of the methods that can be used for this is the geometric morphometrics of wings. This method is fast, is low cost, and does not require expensive equipment. Therefore, it can be easily used by both scientists and beekeepers. However, wing geometric morphometrics is challenging due to the lack of reference data that can be reliably used for comparisons between different geographic regions.</p><p><strong>Findings: </strong>Here, we provide an unprecedented collection of 26,481 honey bee wing images representing 1,725 samples from 13 European countries. The wing images are accompanied by the coordinates of 19 landmarks and the geographic coordinates of the sampling locations. We present an R script that describes the workflow for analyzing the data and identifying an unknown sample. We compared the data with available reference samples for lineage and found general agreement with them.</p><p><strong>Conclusions: </strong>The extensive collection of wing images available on the Zenodo website can be used to identify the geographic origin of unknown samples and therefore assist in the monitoring and conservation of honey bee biodiversity in Europe.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041535/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9624369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2023-03-20DOI: 10.1093/gigascience/giad013
Chen Yang, Theodora Lo, Ka Ming Nip, Saber Hafezqorani, René L Warren, Inanc Birol
{"title":"Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim.","authors":"Chen Yang, Theodora Lo, Ka Ming Nip, Saber Hafezqorani, René L Warren, Inanc Birol","doi":"10.1093/gigascience/giad013","DOIUrl":"10.1093/gigascience/giad013","url":null,"abstract":"<p><strong>Background: </strong>Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment.</p><p><strong>Results: </strong>Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task.</p><p><strong>Conclusions: </strong>The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025935/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9269978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2023-03-20Epub Date: 2023-03-27DOI: 10.1093/gigascience/giad015
Maria Sindeeva, Nikolay Chekanov, Manvel Avetisian, Tatiana I Shashkova, Nikita Baranov, Elian Malkin, Alexander Lapin, Olga Kardymon, Veniamin Fishman
{"title":"Cell type-specific interpretation of noncoding variants using deep learning-based methods.","authors":"Maria Sindeeva, Nikolay Chekanov, Manvel Avetisian, Tatiana I Shashkova, Nikita Baranov, Elian Malkin, Alexander Lapin, Olga Kardymon, Veniamin Fishman","doi":"10.1093/gigascience/giad015","DOIUrl":"10.1093/gigascience/giad015","url":null,"abstract":"<p><p>Interpretation of noncoding genomic variants is one of the most important challenges in human genetics. Machine learning methods have emerged recently as a powerful tool to solve this problem. State-of-the-art approaches allow prediction of transcriptional and epigenetic effects caused by noncoding mutations. However, these approaches require specific experimental data for training and cannot generalize across cell types where required features were not experimentally measured. We show here that available epigenetic characteristics of human cell types are extremely sparse, limiting those approaches that rely on specific epigenetic input. We propose a new neural network architecture, DeepCT, which can learn complex interconnections of epigenetic features and infer unmeasured data from any available input. Furthermore, we show that DeepCT can learn cell type-specific properties, build biologically meaningful vector representations of cell types, and utilize these representations to generate cell type-specific predictions of the effects of noncoding variations in the human genome.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9624368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}