GigaScience最新文献

筛选
英文 中文
Deepdefense: annotation of immune systems in prokaryotes using deep learning. Deepdefense:利用深度学习注释原核生物的免疫系统。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae062
Sven Hauns, Omer S Alkhnbashi, Rolf Backofen
{"title":"Deepdefense: annotation of immune systems in prokaryotes using deep learning.","authors":"Sven Hauns, Omer S Alkhnbashi, Rolf Backofen","doi":"10.1093/gigascience/giae062","DOIUrl":"https://doi.org/10.1093/gigascience/giae062","url":null,"abstract":"<p><strong>Background: </strong>Due to a constant evolutionary arms race, archaea and bacteria have evolved an abundance and diversity of immune responses to protect themselves against phages. Since the discovery and application of CRISPR-Cas adaptive immune systems, numerous novel candidates for immune systems have been identified. Previous approaches to identifying these new immune systems rely on hidden Markov model (HMM)-based homolog searches or use labor-intensive and costly wet-lab experiments. To aid in finding and classifying immune systems genomes, we use machine learning to classify already known immune system proteins and discover potential candidates in the genome. Neural networks have shown promising results in classifying and predicting protein functionality in recent years. However, these methods often operate under the closed-world assumption, where it is presumed that all potential outcomes or classes are already known and included in the training dataset. This assumption does not always hold true in real-world scenarios, such as in genomics, where new samples can emerge that were not previously accounted for in the training phase.</p><p><strong>Results: </strong>In this work, we explore neural networks for immune protein classification, deal with different methods for rejecting unrelated proteins in a genome-wide search, and establish a benchmark. Then, we optimize our approach for accuracy. Based on this, we develop an algorithm called Deepdefense to predict immune cassette classes based on a genome. This design facilitates the differentiation between immune system-related and unrelated proteins by analyzing variations in model-predicted confidence values, aiding in the identification of both known and potentially novel immune system proteins. Finally, we test our approach for detecting immune systems in the genome against an HMM-based method.</p><p><strong>Conclusions: </strong>Deepdefense can automatically detect genes and define cassette annotations and classifications using 2 model classifications. This is achieved by creating an optimized deep learning model to annotate immune systems, in combination with calibration methods, and a second model to enable the scanning of an entire genome.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-quality assembly of the T2T genome for Isodon rubescens f. lushanensis reveals genomic structure variations between 2 typical forms of Isodon rubescens. 高质量的鲁山异齿兽 T2T 基因组组装揭示了两种典型鲁山异齿兽的基因组结构变异。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae075
Hao Yang, Conglong Lian, Jinlu Liu, Hongwei Yu, Le Zhao, Ni He, Xiuyu Liu, Shujuan Xue, Xiaoya Sun, Liping Zhang, Lili Wang, Jingfan Yang, Yu Fu, Rui Ma, Bao Zhang, Lidan Ye, Suiqing Chen
{"title":"High-quality assembly of the T2T genome for Isodon rubescens f. lushanensis reveals genomic structure variations between 2 typical forms of Isodon rubescens.","authors":"Hao Yang, Conglong Lian, Jinlu Liu, Hongwei Yu, Le Zhao, Ni He, Xiuyu Liu, Shujuan Xue, Xiaoya Sun, Liping Zhang, Lili Wang, Jingfan Yang, Yu Fu, Rui Ma, Bao Zhang, Lidan Ye, Suiqing Chen","doi":"10.1093/gigascience/giae075","DOIUrl":"10.1093/gigascience/giae075","url":null,"abstract":"<p><strong>Background: </strong>Rabdosiae rubescentis herba (Isodon rubescens) is widely used as a folk medicine to treat esophageal cancer and sore throat in China. Its germplasm resources are abundant in China, with I. rubescens (Hemsl.) Hara and I. rubescens f. lushanensis as 2 typical forms. I. rubescens (Hemsl.) Hara is featured by biosynthesis of the diterpenoid oridonin with strong anticancer activity, while I. rubescens f. lushanensis produces another diterpenoid with anticancer activity, lushanrubescensin. However, the biosynthetic pathways of both still need to be fully understood. In particular, little is known about the genetic background of I. rubescens f. lushanensis.</p><p><strong>Findings: </strong>We used Pacific Biosciences (PacBio) single-molecule real-time and Nanopore Ultra-long sequencing platforms, respectively, and obtained 139.07 Gb of high-quality data, with a sequencing depth of about 328×. We also obtained a high-quality reference genome for I. rubescens f. lushanensis, with a genome size of 349 Mb and a contig N50 of 28.8 Mb. The heterozygosity of the genome is 1.7% and the repeatability is 83.43%. In total, 34,865 protein-coding genes were predicted. Moreover, we found that most of the variant or unique genes in the diterpenoid synthesis pathways of I. rubescens f. lushanensis and I. rubescens (Hemsl.) Hara were enriched in diterpene synthases.</p><p><strong>Conclusions: </strong>We provide the first genome sequence and gene annotation for the I. rubescens f. lushanensis, which provides molecular evidence for understanding the chemotypic differences of I. rubescens.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466039/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The genomes of 5 underutilized Papilionoideae crops provide insights into root nodulation and disease resistance. 5 种未充分利用的木犀科作物的基因组为了解根瘤和抗病性提供了线索。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae063
Lihua Yuan, Lihong Lei, Fan Jiang, Anqi Wang, Rong Chen, Hengchao Wang, Sihan Meng, Wei Fan
{"title":"The genomes of 5 underutilized Papilionoideae crops provide insights into root nodulation and disease resistance.","authors":"Lihua Yuan, Lihong Lei, Fan Jiang, Anqi Wang, Rong Chen, Hengchao Wang, Sihan Meng, Wei Fan","doi":"10.1093/gigascience/giae063","DOIUrl":"10.1093/gigascience/giae063","url":null,"abstract":"<p><strong>Background: </strong>The Papilionoideae subfamily contains a large amount of underutilized legume crops, which are important for food security and human sustainability. However, the lack of genomic resources has hindered the breeding and utilization of these crops.</p><p><strong>Results: </strong>Here, we present chromosome-level reference genomes for 5 underutilized diploid Papilionoideae crops: sword bean (Canavalia gladiata), scarlet runner bean (Phaseolus coccineus), winged bean (Psophocarpus tetragonolobus), smooth rattlebox (Crotalaria pallida), and butterfly pea (Clitoria ternatea), with assembled genome sizes of 0.62 Gb, 0.59 Gb, 0.71 Gb, 1.22 Gb, and 1.72 Gb, respectively. We found that the long period of higher long terminal repeat retrotransposon activity is the major reason that the genome size of smooth rattlebox and butterfly pea is enlarged. Additionally, there have been no recent whole-genome duplication (WGD) events in these 5 species except for the shared papilionoid-specific WGD event (∼55 million years ago). Then, we identified 5,328 and 10,434 species-specific genes between scarlet runner bean and common bean, respectively, which may be responsible for their phenotypic and functional differences and species-specific functions. Furthermore, we identified the key genes involved in root-nodule symbiosis (RNS) in all 5 species and found that the NIN gene was duplicated in the early Papilionoideae ancestor, followed by the loss of 1 gene copy in smooth rattlebox and butterfly pea lineages. Last, we identified the resistance (R) genes for plant defenses in these 5 species and characterized their evolutionary history.</p><p><strong>Conclusions: </strong>In summary, this study provides chromosome-scale reference genomes for 3 grain and vegetable beans (sword bean, scarlet runner bean, winged bean), along with genomes for a green manure crop (smooth rattlebox) and a food dyeing crop (butterfly pea). These genomes are crucial for studying phylogenetic history, unraveling nitrogen-fixing RNS evolution, and advancing plant defense research.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11348429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142080055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interconnected data infrastructure to support large-scale rare disease research. 支持大规模罕见病研究的互联数据基础设施。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae058
Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes
{"title":"An interconnected data infrastructure to support large-scale rare disease research.","authors":"Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes","doi":"10.1093/gigascience/giae058","DOIUrl":"https://doi.org/10.1093/gigascience/giae058","url":null,"abstract":"<p><p>The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing (\"solving\") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS \"RD3\" and Café Variome \"Discovery Nexus\" connect data and metadata and offer discovery services, and secure cloud-based \"Sandboxes\" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142283909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomic exploration of the endangered oriental stork, Ciconia boyciana, sheds light on migration adaptation and future conservation. 对濒危东方白鹳(Ciconia boyciana)的基因组研究揭示了迁徙适应性和未来保护问题。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae081
Shangchen Yang, Yan Liu, Xiaoqing Zhao, Jin Chen, Haimeng Li, Hongrui Liang, Jiale Fan, Mengchao Zhou, Shiqing Wang, Xiaotian Zhang, Minhui Shi, Lei Han, Mingyuan Yu, Yaxian Lu, Boyang Liu, Yu Xu, Tianming Lan, Zhijun Hou
{"title":"Genomic exploration of the endangered oriental stork, Ciconia boyciana, sheds light on migration adaptation and future conservation.","authors":"Shangchen Yang, Yan Liu, Xiaoqing Zhao, Jin Chen, Haimeng Li, Hongrui Liang, Jiale Fan, Mengchao Zhou, Shiqing Wang, Xiaotian Zhang, Minhui Shi, Lei Han, Mingyuan Yu, Yaxian Lu, Boyang Liu, Yu Xu, Tianming Lan, Zhijun Hou","doi":"10.1093/gigascience/giae081","DOIUrl":"10.1093/gigascience/giae081","url":null,"abstract":"<p><strong>Background: </strong>The oriental stork, Ciconia boyciana, is an endangered migratory bird listed on the International Union for Conservation of Nature's Red List. The bird population has experienced a rapid decline in the past decades, with nest locations and stop-over sites largely degraded due to human-bird conflicts. Multipronged conservation efforts are required to secure the future of oriental storks. We propose that a thorough understanding of the genome-wide genetic background of this threatened bird species is critical to make future conservation strategies.</p><p><strong>Findings: </strong>In this study, the first chromosome-scale reference genome was presented for the oriental stork with high quality, contiguity, and accuracy. The assembled genome size was 1.24 Gb with a scaffold N50 of 103 Mb, and 1.23 Gb contigs (99.32%) were anchored to 35 chromosomes. Population genomic analysis did not show a genetic structure in the wild population. Genome-wide genetic diversity (π = 0.0012) of the oriental stork was at a moderate to high level among threatened bird species, and the inbreeding risk was also not significant (FROH = 5.56% ± 5.30%). Reconstruction of demographic history indicated a rapid recent population decline likely driven by human activities. Genes that were under positive selection associated with the migratory trait were identified in relation to the long-term potentiation, photoreceptor cell organization, circadian rhythm, muscle development, and energy metabolism, indicating the essential interplay between genetic and ecological adaptation.</p><p><strong>Conclusions: </strong>Our study presents the first chromosome-scale genome assembly of the oriental stork and provides a genomic basis for understanding a genetic background of the oriental stork, the population's extinction risks, and the migratory characteristics, which will facilitate the decision of future conservation plans for this species.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142462741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The probability of edge existence due to node degree: a baseline for network-based predictions. 节点度导致边缘存在的概率:基于网络的预测基准。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae001
Michael Zietz, Daniel S Himmelstein, Kyle Kloster, Christopher Williams, Michael W Nagle, Casey S Greene
{"title":"The probability of edge existence due to node degree: a baseline for network-based predictions.","authors":"Michael Zietz, Daniel S Himmelstein, Kyle Kloster, Christopher Williams, Michael W Nagle, Casey S Greene","doi":"10.1093/gigascience/giae001","DOIUrl":"10.1093/gigascience/giae001","url":null,"abstract":"<p><p>Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10848215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139697143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
vEMstitch: an algorithm for fully automatic image stitching of volume electron microscopy. vEMstitch:体积电子显微镜全自动图像拼接算法。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae076
Bintao He, Yan Zhang, Zhenbang Zhang, Yiran Cheng, Fa Zhang, Fei Sun, Renmin Han
{"title":"vEMstitch: an algorithm for fully automatic image stitching of volume electron microscopy.","authors":"Bintao He, Yan Zhang, Zhenbang Zhang, Yiran Cheng, Fa Zhang, Fei Sun, Renmin Han","doi":"10.1093/gigascience/giae076","DOIUrl":"https://doi.org/10.1093/gigascience/giae076","url":null,"abstract":"<p><strong>Background: </strong>As software and hardware have developed, so has the scale of research into volume electron microscopy (vEM), leading to ever-increasing resolution. Usually, data collection is followed by image stitching: the same area is subjected to high-resolution imaging with a certain overlap, and then the images are stitched together to achieve ultrastructure with large scale and high resolution simultaneously. However, there is currently no perfect method for image stitching, especially when the global feature distribution of the sample is uneven and the feature points of the overlap area cannot be matched accurately, which results in ghosting of the fusion area.</p><p><strong>Results: </strong>We have developed a novel algorithm called vEMstitch to solve these problems, aiming for seamless and clear stitching of high-resolution images. In vEMstitch, the image transformation model is constructed as a combination of global rigid and local elastic transformation using weighted pixel displacement fields. Specific local geometric constraints and feature reextraction strategies are incorporated to ensure that the transformation model accurately and completely reflects the characteristics of biological distortions. To demonstrate the applicability of vEMstitch, we conducted thorough testing on simulated datasets involving different transformation combinations, consistently showing promising performance. Furthermore, in real data sample experiments, vEMstitch successfully gives clear ultrastructure in the stitching region, reaffirming the effectiveness of the algorithm.</p><p><strong>Conclusions: </strong>vEMstitch serves as a valuable tool for large-field and high-resolution image stitching. The clear stitched regions facilitate better visualization and identification in vEM analysis. The source code is available at https://github.com/HeracleBT/vEMstitch.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11512480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open Data governance at the Canadian Open Neuroscience Platform (CONP): From the Walled Garden to the Arboretum. 加拿大开放神经科学平台(CONP)的开放数据管理:从围墙花园到树木园。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad114
Alexander Bernier, Bartha M Knoppers, Patrick Bermudez, Michael J S Beauvais, Adrian Thorogood, Alan Evans
{"title":"Open Data governance at the Canadian Open Neuroscience Platform (CONP): From the Walled Garden to the Arboretum.","authors":"Alexander Bernier, Bartha M Knoppers, Patrick Bermudez, Michael J S Beauvais, Adrian Thorogood, Alan Evans","doi":"10.1093/gigascience/giad114","DOIUrl":"10.1093/gigascience/giad114","url":null,"abstract":"<p><p>Scientific research communities pursue dual imperatives in implementing strategies to share their data. These communities attempt to maximize the accessibility of biomedical data for downstream research use, in furtherance of open science objectives. Simultaneously, such communities safeguard the interests of research participants through data stewardship measures and the integration of suitable risk disclosures to the informed consent process. The Canadian Open Neuroscience Platform (CONP) convened an Ethics and Governance Committee composed of experts in bioethics, neuroethics, and law to develop holistic policy tools, organizational approaches, and technological supports to align the open governance of data with ethical and legal norms. The CONP has adopted novel platform governance methods that favor full data openness, legitimated through the use of robust deidentification processes and informed consent practices. The experience of the CONP is articulated as a potential template for other open science efforts to further build upon. This experience highlights informed consent guidance, deidentification practices, ethicolegal metadata, platform-level norms, and commercialization and publication policies as the principal pillars of a practicable approach to the governance of open data. The governance approach adopted by the CONP stands as a viable model for the broader neuroscience and open science communities to adopt for sharing data in full open access.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139466412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A qualitative assessment of using ChatGPT as large language model for scientific workflow development. 使用 ChatGPT 作为科学工作流程开发的大型语言模型的定性评估。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae030
Mario Sänger, Ninon De Mecquenem, Katarzyna Ewa Lewińska, Vasilis Bountris, Fabian Lehmann, Ulf Leser, Thomas Kosch
{"title":"A qualitative assessment of using ChatGPT as large language model for scientific workflow development.","authors":"Mario Sänger, Ninon De Mecquenem, Katarzyna Ewa Lewińska, Vasilis Bountris, Fabian Lehmann, Ulf Leser, Thomas Kosch","doi":"10.1093/gigascience/giae030","DOIUrl":"10.1093/gigascience/giae030","url":null,"abstract":"<p><strong>Background: </strong>Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages.</p><p><strong>Results: </strong>To address these challenges, we investigate the efficiency of large language models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed 3 user studies in 2 scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.</p><p><strong>Conclusions: </strong>Our results show a high accuracy for comprehending and explaining scientific workflows while achieving a reduced performance for modifying and extending workflow descriptions. These findings clearly illustrate the need for further research in this area.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11186067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes. gNOMO2:用于微生物组多组学综合分析的综合模块化管道。
IF 11.8 2区 生物学
GigaScience Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae038
Muzaffer Arikan, Thilo Muth
{"title":"gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes.","authors":"Muzaffer Arikan, Thilo Muth","doi":"10.1093/gigascience/giae038","DOIUrl":"10.1093/gigascience/giae038","url":null,"abstract":"<p><strong>Background: </strong>In recent years, omics technologies have offered an exceptional chance to gain a deeper insight into the structural and functional characteristics of microbial communities. As a result, there is a growing demand for user-friendly, reproducible, and versatile bioinformatic tools that can effectively harness multi-omics data to provide a holistic understanding of microbiomes. Previously, we introduced gNOMO, a bioinformatic pipeline tailored to analyze microbiome multi-omics data in an integrative manner. In response to the evolving demands within the microbiome field and the growing necessity for integrated multi-omics data analysis, we have implemented substantial enhancements to the gNOMO pipeline.</p><p><strong>Results: </strong>Here, we present gNOMO2, a comprehensive and modular pipeline that can seamlessly manage various omics combinations, ranging from 2 to 4 distinct omics data types, including 16S ribosomal RNA (rRNA) gene amplicon sequencing, metagenomics, metatranscriptomics, and metaproteomics. Furthermore, gNOMO2 features a specialized module for processing 16S rRNA gene amplicon sequencing data to create a protein database suitable for metaproteomics investigations. Moreover, it incorporates new differential abundance, integration, and visualization approaches, enhancing the toolkit for a more insightful analysis of microbiomes. The functionality of these new features is showcased through the use of 4 microbiome multi-omics datasets encompassing various ecosystems and omics combinations. gNOMO2 not only replicated most of the primary findings from these studies but also offered further valuable perspectives.</p><p><strong>Conclusions: </strong>gNOMO2 enables the thorough integration of taxonomic and functional analyses in microbiome multi-omics data, offering novel insights in both host-associated and free-living microbiome research. gNOMO2 is available freely at https://github.com/muzafferarikan/gNOMO2.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11240238/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信