GigaScience最新文献

筛选
英文 中文
Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). 利用机器学习和视觉生理学光蛋白数据库(VPOD)发现基因型与表型之间的关系。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae073
Seth A Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A Crandall, Todd H Oakley
{"title":"Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD).","authors":"Seth A Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A Crandall, Todd H Oakley","doi":"10.1093/gigascience/giae073","DOIUrl":"https://doi.org/10.1093/gigascience/giae073","url":null,"abstract":"<p><strong>Background: </strong>Predicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax-the wavelength of maximum absorbance-which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype.</p><p><strong>Results: </strong>Here, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites.</p><p><strong>Conclusion: </strong>The ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism's ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11512451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolutionary genomics of three agricultural pest moths reveals rapid evolution of host adaptation and immune-related genes. 三种农业害蛾的进化基因组学揭示了宿主适应和免疫相关基因的快速进化。
IF 3.5 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad103
Yi-Ming Weng, Pathour R Shashank, R Keating Godfrey, David Plotkin, Brandon M Parker, Tyler Wist, Akito Y Kawahara
{"title":"Evolutionary genomics of three agricultural pest moths reveals rapid evolution of host adaptation and immune-related genes.","authors":"Yi-Ming Weng, Pathour R Shashank, R Keating Godfrey, David Plotkin, Brandon M Parker, Tyler Wist, Akito Y Kawahara","doi":"10.1093/gigascience/giad103","DOIUrl":"10.1093/gigascience/giad103","url":null,"abstract":"<p><strong>Background: </strong>Understanding the genotype of pest species provides an important baseline for designing integrated pest management (IPM) strategies. Recently developed long-read sequence technologies make it possible to compare genomic features of nonmodel pest species to disclose the evolutionary path underlying the pest species profiles. Here we sequenced and assembled genomes for 3 agricultural pest gelechiid moths: Phthorimaea absoluta (tomato leafminer), Keiferia lycopersicella (tomato pinworm), and Scrobipalpa atriplicella (goosefoot groundling moth). We also compared genomes of tomato leafminer and tomato pinworm with published genomes of Phthorimaea operculella and Pectinophora gossypiella to investigate the gene family evolution related to the pest species profiles.</p><p><strong>Results: </strong>We found that the 3 solanaceous feeding species, P. absoluta, K. lycopersicella, and P. operculella, are clustered together. Gene family evolution analyses with the 4 species show clear gene family expansions on host plant-associated genes for the 3 solanaceous feeding species. These genes are involved in host compound sensing (e.g., gustatory receptors), detoxification (e.g., ABC transporter C family, cytochrome P450, glucose-methanol-choline oxidoreductase, insect cuticle proteins, and UDP-glucuronosyl), and digestion (e.g., serine proteases and peptidase family S1). A gene ontology enrichment analysis of rapid evolving genes also suggests enriched functions in host sensing and immunity.</p><p><strong>Conclusions: </strong>Our results of family evolution analyses indicate that host plant adaptation and pathogen defense could be important drivers in species diversification among gelechiid moths.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139073844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome evolution and transcriptome plasticity is associated with adaptation to monocot and dicot plants in Colletotrichum fungi. 基因组进化和转录组可塑性与 Colletotrichum 真菌对单子叶和双子叶植物的适应有关。
IF 3.5 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae036
Riccardo Baroncelli, José F Cobo-Díaz, Tiziano Benocci, Mao Peng, Evy Battaglia, Sajeet Haridas, William Andreopoulos, Kurt LaButti, Jasmyn Pangilinan, Anna Lipzen, Maxim Koriabine, Diane Bauer, Gaetan Le Floch, Miia R Mäkelä, Elodie Drula, Bernard Henrissat, Igor V Grigoriev, Jo Anne Crouch, Ronald P de Vries, Serenella A Sukno, Michael R Thon
{"title":"Genome evolution and transcriptome plasticity is associated with adaptation to monocot and dicot plants in Colletotrichum fungi.","authors":"Riccardo Baroncelli, José F Cobo-Díaz, Tiziano Benocci, Mao Peng, Evy Battaglia, Sajeet Haridas, William Andreopoulos, Kurt LaButti, Jasmyn Pangilinan, Anna Lipzen, Maxim Koriabine, Diane Bauer, Gaetan Le Floch, Miia R Mäkelä, Elodie Drula, Bernard Henrissat, Igor V Grigoriev, Jo Anne Crouch, Ronald P de Vries, Serenella A Sukno, Michael R Thon","doi":"10.1093/gigascience/giae036","DOIUrl":"10.1093/gigascience/giae036","url":null,"abstract":"<p><strong>Background: </strong>Colletotrichum fungi infect a wide diversity of monocot and dicot hosts, causing diseases on almost all economically important plants worldwide. Colletotrichum is also a suitable model for studying gene family evolution on a fine scale to uncover events in the genome associated with biological changes.</p><p><strong>Results: </strong>Here we present the genome sequences of 30 Colletotrichum species covering the diversity within the genus. Evolutionary analyses revealed that the Colletotrichum ancestor diverged in the late Cretaceous in parallel with the diversification of flowering plants. We provide evidence of independent host jumps from dicots to monocots during the evolution of Colletotrichum, coinciding with a progressive shrinking of the plant cell wall degradative arsenal and expansions in lineage-specific gene families. Comparative transcriptomics of 4 species adapted to different hosts revealed similarity in gene content but high diversity in the modulation of their transcription profiles on different plant substrates. Combining genomics and transcriptomics, we identified a set of core genes such as specific transcription factors, putatively involved in plant cell wall degradation.</p><p><strong>Conclusions: </strong>These results indicate that the ancestral Colletotrichum were associated with dicot plants and certain branches progressively adapted to different monocot hosts, reshaping the gene content and its regulation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model. MOBFinder:基于语言模型的质粒元基因组片段动员分型工具。
IF 3.5 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae047
Tao Feng, Shufang Wu, Hongwei Zhou, Zhencheng Fang
{"title":"MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model.","authors":"Tao Feng, Shufang Wu, Hongwei Zhou, Zhencheng Fang","doi":"10.1093/gigascience/giae047","DOIUrl":"10.1093/gigascience/giae047","url":null,"abstract":"<p><strong>Background: </strong>Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs.</p><p><strong>Results: </strong>We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D.</p><p><strong>Conclusions: </strong>To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering cancer genomes with GenomeSpy: a grammar-based visualization toolkit. 用 GenomeSpy 解密癌症基因组:基于语法的可视化工具包。
IF 3.5 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae040
Kari Lavikka, Jaana Oikkonen, Yilin Li, Taru Muranen, Giulia Micoli, Giovanni Marchi, Alexandra Lahtinen, Kaisa Huhtinen, Rainer Lehtonen, Sakari Hietanen, Johanna Hynninen, Anni Virtanen, Sampsa Hautaniemi
{"title":"Deciphering cancer genomes with GenomeSpy: a grammar-based visualization toolkit.","authors":"Kari Lavikka, Jaana Oikkonen, Yilin Li, Taru Muranen, Giulia Micoli, Giovanni Marchi, Alexandra Lahtinen, Kaisa Huhtinen, Rainer Lehtonen, Sakari Hietanen, Johanna Hynninen, Anni Virtanen, Sampsa Hautaniemi","doi":"10.1093/gigascience/giae040","DOIUrl":"10.1093/gigascience/giae040","url":null,"abstract":"<p><strong>Background: </strong>Visualization is an indispensable facet of genomic data analysis. Despite the abundance of specialized visualization tools, there remains a distinct need for tailored solutions. However, their implementation typically requires extensive programming expertise from bioinformaticians and software developers, especially when building interactive applications. Toolkits based on visualization grammars offer a more accessible, declarative way to author new visualizations. Yet, current grammar-based solutions fall short in adequately supporting the interactive analysis of large datasets with extensive sample collections, a pivotal task often encountered in cancer research.</p><p><strong>Findings: </strong>We present GenomeSpy, a grammar-based toolkit for authoring tailored, interactive visualizations for genomic data analysis. By using combinatorial building blocks and a declarative language, users can implement new visualization designs easily and embed them in web pages or end-user-oriented applications. A distinctive element of GenomeSpy's architecture is its effective use of the graphics processing unit in all rendering, enabling a high frame rate and smoothly animated interactions, such as navigation within a genome. We demonstrate the utility of GenomeSpy by characterizing the genomic landscape of 753 ovarian cancer samples from patients in the DECIDER clinical trial. Our results expand the understanding of the genomic architecture in ovarian cancer, particularly the diversity of chromosomal instability.</p><p><strong>Conclusions: </strong>GenomeSpy is a visualization toolkit applicable to a wide range of tasks pertinent to genome analysis. It offers high flexibility and exceptional performance in interactive analysis. The toolkit is open source with an MIT license, implemented in JavaScript, and available at https://genomespy.app/.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and implementation of a scalable high-performance computing (HPC) cluster for omics data analysis: achievements, challenges and recommendations in LMICs. 为omics数据分析设计和实施可扩展高性能计算(HPC)集群:低收入国家的成就、挑战和建议。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae060
Kais Ghedira, Oussema Khamessi, Chaima Hkimi, Selim Kamoun, Nader Dhamer, Kamel Daassi, Wassim Ben Salah, Houcemeddine Othman, Wahbi Belhadj, Youssef Ghorbal
{"title":"Design and implementation of a scalable high-performance computing (HPC) cluster for omics data analysis: achievements, challenges and recommendations in LMICs.","authors":"Kais Ghedira, Oussema Khamessi, Chaima Hkimi, Selim Kamoun, Nader Dhamer, Kamel Daassi, Wassim Ben Salah, Houcemeddine Othman, Wahbi Belhadj, Youssef Ghorbal","doi":"10.1093/gigascience/giae060","DOIUrl":"10.1093/gigascience/giae060","url":null,"abstract":"<p><strong>Background: </strong>The advent of high-throughput technologies, including cutting-edge sequencing devices, has revolutionized biomedical data generation and processing. Nevertheless, big data applications require novel hardware and software for parallel computing and management to handle the ever-growing data size and analysis complexity. On-premise, high-performance computing (HPC) is increasingly used in biomedical research for big data stewardship.</p><p><strong>Findings: </strong>In this work, we present Tunisia's first high-performance computational infrastructure for omics research.</p><p><strong>Method: </strong>We highlight measurements and recommendations that may help institutions in other low- and middle-income countries that are eager to implement local HPC in facilities for bioinformatics research and omics data analyses.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TooManyCellsInteractive: A visualization tool for dynamic exploration of single-cell data. TooManyCellsInteractive:动态探索单细胞数据的可视化工具
IF 3.5 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae056
Conor Klamann, Christie J Lau, Javier Ruiz-Ramírez, Gregory W Schwartz
{"title":"TooManyCellsInteractive: A visualization tool for dynamic exploration of single-cell data.","authors":"Conor Klamann, Christie J Lau, Javier Ruiz-Ramírez, Gregory W Schwartz","doi":"10.1093/gigascience/giae056","DOIUrl":"10.1093/gigascience/giae056","url":null,"abstract":"<p><strong>Background: </strong>As single-cell sequencing technologies continue to advance, the growing volume and complexity of the ensuing data present new analytical challenges. Large cellular populations from single-cell atlases are more difficult to visualize and require extensive processing to identify biologically relevant subpopulations. Managing these workflows is also laborious for technical users and unintuitive for nontechnical users.</p><p><strong>Results: </strong>We present TooManyCellsInteractive (TMCI), a browser-based JavaScript application for interactive exploration of cell populations. TMCI provides an intuitive interface to visualize and manipulate a radial tree representation of hierarchical cell subpopulations and allows users to easily overlay, filter, and compare biological features at multiple resolutions. Here we describe the software architecture and demonstrate how we used TMCI in a pan-cancer analysis to identify unique survival pathways among drug-tolerant persister cells.</p><p><strong>Conclusions: </strong>TMCI will facilitate exploration and visualization of large-scale sequencing data in a user-friendly way. TMCI is freely available at https://github.com/schwartzlab-methods/too-many-cells-interactive. An example tree from data within this article is available at https://tmci.schwartzlab.ca/.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DriverMP enables improved identification of cancer driver genes 通过 DriverMP,可以更好地识别癌症驱动基因
IF 9.2 2区 生物学
GigaScience Pub Date : 2023-12-13 DOI: 10.1093/gigascience/giad106
Yangyang Liu, Jiyun Han, Tongxin Kong, Nannan Xiao, Qinglin Mei, Juntao Liu
{"title":"DriverMP enables improved identification of cancer driver genes","authors":"Yangyang Liu, Jiyun Han, Tongxin Kong, Nannan Xiao, Qinglin Mei, Juntao Liu","doi":"10.1093/gigascience/giad106","DOIUrl":"https://doi.org/10.1093/gigascience/giad106","url":null,"abstract":"Background Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. Findings We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, protein‒protein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. Conclusions The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"8 1","pages":""},"PeriodicalIF":9.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138684540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Honey bee (Apis mellifera) wing images: a tool for identification and conservation. 蜜蜂(Apis mellifera)翅膀图像:一种识别和保护工具。
IF 3.5 2区 生物学
GigaScience Pub Date : 2023-03-20 Epub Date: 2023-03-27 DOI: 10.1093/gigascience/giad019
Andrzej Oleksa, Eliza Căuia, Adrian Siceanu, Zlatko Puškadija, Marin Kovačić, M Alice Pinto, Pedro João Rodrigues, Fani Hatjina, Leonidas Charistos, Maria Bouga, Janez Prešern, İrfan Kandemir, Slađan Rašić, Szilvia Kusza, Adam Tofilski
{"title":"Honey bee (Apis mellifera) wing images: a tool for identification and conservation.","authors":"Andrzej Oleksa, Eliza Căuia, Adrian Siceanu, Zlatko Puškadija, Marin Kovačić, M Alice Pinto, Pedro João Rodrigues, Fani Hatjina, Leonidas Charistos, Maria Bouga, Janez Prešern, İrfan Kandemir, Slađan Rašić, Szilvia Kusza, Adam Tofilski","doi":"10.1093/gigascience/giad019","DOIUrl":"10.1093/gigascience/giad019","url":null,"abstract":"<p><strong>Background: </strong>The honey bee (Apis mellifera) is an ecologically and economically important species that provides pollination services to natural and agricultural systems. The biodiversity of the honey bee in parts of its native range is endangered by migratory beekeeping and commercial breeding. In consequence, some honey bee populations that are well adapted to the local environment are threatened with extinction. A crucial step for the protection of honey bee biodiversity is reliable differentiation between native and nonnative bees. One of the methods that can be used for this is the geometric morphometrics of wings. This method is fast, is low cost, and does not require expensive equipment. Therefore, it can be easily used by both scientists and beekeepers. However, wing geometric morphometrics is challenging due to the lack of reference data that can be reliably used for comparisons between different geographic regions.</p><p><strong>Findings: </strong>Here, we provide an unprecedented collection of 26,481 honey bee wing images representing 1,725 samples from 13 European countries. The wing images are accompanied by the coordinates of 19 landmarks and the geographic coordinates of the sampling locations. We present an R script that describes the workflow for analyzing the data and identifying an unknown sample. We compared the data with available reference samples for lineage and found general agreement with them.</p><p><strong>Conclusions: </strong>The extensive collection of wing images available on the Zenodo website can be used to identify the geographic origin of unknown samples and therefore assist in the monitoring and conservation of honey bee biodiversity in Europe.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"12 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041535/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9624369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim. 利用 Meta-NanoSim 鉴定和模拟元基因组纳米孔测序数据。
IF 3.5 2区 生物学
GigaScience Pub Date : 2023-03-20 DOI: 10.1093/gigascience/giad013
Chen Yang, Theodora Lo, Ka Ming Nip, Saber Hafezqorani, René L Warren, Inanc Birol
{"title":"Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim.","authors":"Chen Yang, Theodora Lo, Ka Ming Nip, Saber Hafezqorani, René L Warren, Inanc Birol","doi":"10.1093/gigascience/giad013","DOIUrl":"10.1093/gigascience/giad013","url":null,"abstract":"<p><strong>Background: </strong>Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment.</p><p><strong>Results: </strong>Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task.</p><p><strong>Conclusions: </strong>The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"12 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025935/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9269978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信