{"title":"Observational, causal relationship and shared genetic basis between cholelithiasis and gastroesophageal reflux disease: evidence from a cohort study and comprehensive genetic analysis.","authors":"Yanlin Lyu, Shuangshuang Tong, Wentao Huang, Yuying Ma, Ruijie Zeng, Rui Jiang, Ruibang Luo, Felix W Leung, Qizhou Lian, Weihong Sha, Hao Chen","doi":"10.1093/gigascience/giaf023","DOIUrl":"10.1093/gigascience/giaf023","url":null,"abstract":"<p><strong>Objective: </strong>Cholelithiasis and gastroesophageal reflux disease (GERD) contribute to significant health concerns. We aimed to investigate the potential observational, causal, and genetic relationships between cholelithiasis and GERD.</p><p><strong>Design: </strong>The observational correlations were assessed based on the prospective cohort study from UK Biobank. Then, by leveraging the genome-wide summary statistics of cholelithiasis (N = 334,277) and GERD (N = 332,601), the bidirectional causal associations were evaluated using Mendelian randomization (MR) analysis. Subsequently, a series of genetic analyses was used to assess the genetic correlation, shared loci, and genes between cholelithiasis and GERD.</p><p><strong>Results: </strong>The prospective cohort analyses revealed a significantly increased risk of GERD in individuals with cholelithiasis (hazard ratio [HR] = 1.99; 95% confidence interval [CI], 1.89-2.10) and a higher risk of cholelithiasis among patients with GERD (HR = 2.30; 95% CI, 2.18-2.44). The MR study indicated the causal effect of genetic liability to cholelithiasis on the incidence of GERD (odds ratio [OR] = 1.08; 95% CI, 1.05-1.11) and the causal effect of genetic predicted GERD on cholelithiasis (OR = 1.15; 95% CI, 1.02-1.31). In addition, cholelithiasis and GERD exhibited a strong genetic association. Cross-trait meta-analyses identified 5 novel independent loci shared between cholelithiasis and GERD. Three shared genes, including SUN2, CBY1, and JOSD1, were further identified as novel risk genes.</p><p><strong>Conclusion: </strong>The elucidation of the shared genetic basis underlying the phenotypic relationship of these 2 complex phenotypes offers new insights into the intrinsic linkage between cholelithiasis and GERD, providing a novel research direction for future therapeutic strategy and risk prediction.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11943489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143729537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf066
{"title":"Retraction and replacement of: Telomere-to-telomere genome and resequencing of 254 individuals reveal evolution, genomic footprints in Asian icefish, Protosalanx chinensis.","authors":"","doi":"10.1093/gigascience/giaf066","DOIUrl":"10.1093/gigascience/giaf066","url":null,"abstract":"","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12266832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144649192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf036
Giuseppe Gallitto, Robert Englert, Balint Kincses, Raviteja Kotikalapudi, Jialin Li, Kevin Hoffschlag, Ulrike Bingel, Tamas Spisak
{"title":"External validation of machine learning models-registered models and adaptive sample splitting.","authors":"Giuseppe Gallitto, Robert Englert, Balint Kincses, Raviteja Kotikalapudi, Jialin Li, Kevin Hoffschlag, Ulrike Bingel, Tamas Spisak","doi":"10.1093/gigascience/giaf036","DOIUrl":"10.1093/gigascience/giaf036","url":null,"abstract":"<p><strong>Background: </strong>Multivariate predictive models play a crucial role in enhancing our understanding of complex biological systems and in developing innovative, replicable tools for translational medical research. However, the complexity of machine learning methods and extensive data preprocessing and feature engineering pipelines can lead to overfitting and poor generalizability. An unbiased evaluation of predictive models necessitates external validation, which involves testing the finalized model on independent data. Despite its importance, external validation is often neglected in practice due to the associated costs.</p><p><strong>Results: </strong>Here we propose that, for maximal credibility, model discovery and external validation should be separated by the public disclosure (e.g., preregistration) of feature processing steps and model weights. Furthermore, we introduce a novel approach to optimize the trade-off between efforts spent on model discovery and external validation in such studies. We show on data involving more than 3,000 participants from four different datasets that, for any \"sample size budget,\" the proposed adaptive splitting approach can successfully identify the optimal time to stop model discovery so that predictive performance is maximized without risking a low-powered, and thus inconclusive, external validation.</p><p><strong>Conclusion: </strong>The proposed design and splitting approach (implemented in the Python package \"AdaptiveSplit\") may contribute to addressing issues of replicability, effect size inflation, and generalizability in predictive modeling studies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077397/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144077476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf025
Giacomo B Marino, Stephanie Olaiya, John Erol Evangelista, Daniel J B Clarke, Avi Ma'ayan
{"title":"GeneSetCart: assembling, augmenting, combining, visualizing, and analyzing gene sets.","authors":"Giacomo B Marino, Stephanie Olaiya, John Erol Evangelista, Daniel J B Clarke, Avi Ma'ayan","doi":"10.1093/gigascience/giaf025","DOIUrl":"10.1093/gigascience/giaf025","url":null,"abstract":"<p><p>Converting multiomics datasets into gene sets facilitates data integration that leads to knowledge discovery. Although there are tools developed to analyze gene sets, only a few offer the management of gene sets from multiple sources. GeneSetCart is an interactive web-based platform that enables investigators to gather gene sets from various sources; augment these sets with gene-gene coexpression correlations and protein-protein interactions; perform set operations on these sets such as union, consensus, and intersection; and visualize and analyze these gene sets, all in one place. GeneSetCart supports the upload of single or multiple gene sets, as well as fetching gene sets by searching PubMed for genes comentioned with terms in publications. Venn diagrams, heatmaps, Uniform Manifold Approximation and Projection (UMAP) plots, SuperVenn diagrams, and UpSet plots can visualize the gene sets in a GeneSetCart session to summarize the similarity and overlap among the sets. Users of GeneSetCart can also perform enrichment analysis on their assembled gene sets with external tools. All gene sets in a session can be saved to a user account for reanalysis and sharing with collaborators. GeneSetCart has a gene set library crossing feature that enables analysis of gene sets created from several National Institutes of Health Common Fund programs. For the top overlapping sets from pairs of programs, a large language model (LLM) is prompted to propose possible reasons for the high overlap. Using this feature, two use cases are presented. In addition, users of GeneSetCart can produce publication-ready reports from their uploaded sets. Text in these reports is also supplemented with an LLM. Overall, GeneSetCart is a useful resource enabling biologists without programming expertise to facilitate data integration for hypothesis generation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11984350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143975144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf039
Yuheng Du, Paula A Benny, Yuchen Shao, Ryan J Schlueter, Alexandra Gurary, Annette Lum-Jones, Cameron B Lassiter, Fadhl M AlAkwaa, Maarit Tiirikainen, Dena Towner, W Steven Ward, Lana X Garmire
{"title":"Multiomics analysis of umbilical cord hematopoietic stem cells from a multiethnic cohort of Hawaii reveals the intergenerational effect of maternal prepregnancy obesity and risks for cancers.","authors":"Yuheng Du, Paula A Benny, Yuchen Shao, Ryan J Schlueter, Alexandra Gurary, Annette Lum-Jones, Cameron B Lassiter, Fadhl M AlAkwaa, Maarit Tiirikainen, Dena Towner, W Steven Ward, Lana X Garmire","doi":"10.1093/gigascience/giaf039","DOIUrl":"10.1093/gigascience/giaf039","url":null,"abstract":"<p><strong>Background: </strong>Maternal obesity is a health concern that may predispose newborns to a high risk of medical problems later in life. To understand the intergenerational effect of maternal obesity, we hypothesized that the maternal obesity effect is mediated by epigenetic changes in the CD34+/CD38-/Lin- hematopoietic stem cells (uHSCs) in the offspring. To investigate this, we conducted a DNA methylation centric multiomics study. We measured DNA methylation and gene expression of the CD34+/CD38-/Lin- uHSCs and metabolomics of the cord blood, all from a multiethnic cohort from Kapiolani Medical Center for Women and Children in Honolulu, Hawaii (n=72, collected between 2016 and 2018).</p><p><strong>Results: </strong>Differential methylation analysis unveiled a global hypermethylation pattern in the maternal prepregnancy obese group (BH adjusted P < 0.05), after adjusting for major clinical confounders. KEGG pathway enrichment, WGCNA, and PPI analyses revealed that hypermethylated CpG sites were involved in critical biological processes, including cell cycle, protein synthesis, immune signaling, and lipid metabolism. Utilizing Shannon entropy on uHSCs methylation, we discerned notably higher quiescence of uHSCs impacted by maternal obesity. Additionally, the integration of multiomics data-including methylation, gene expression, and metabolomics-provided further evidence of dysfunctions in adipogenesis, erythropoietin production, cell differentiation, and DNA repair, aligning with the findings at the epigenetic level. Furthermore, we trained a random forest classifier using the CpG sites in the genes of the top pathways associated with maternal obesity, and applied it to predict cancer versus adjacent normal sample labels in 14 Cancer Genome Atlas (TCGA) cancer types. Five of 14 cancers showed balanced accuracy of 0.6 or higher: LUSC (0.87), PAAD (0.83), KIRC (0.71), KIRP (0.63) and BRCA (0.60).</p><p><strong>Conclusions: </strong>This study revealed the significant correlation between prepregnancy maternal obesity and multiomics-level molecular changes in the uHSCs of offspring, particularly at the DNA methylation level. These maternal-obesity-associated epigenetic markers in uHSCs may contribute to increased risks in certain cancers of the offspring. Larger and multicenter cohort validation studies are warranted to follow up the current single-site study.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12087453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144101599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf102
Nanna Gaun, Carlotta Pietroni, Garazi Martin-Bideguren, Jonas Lauritsen, Ostaizka Aizpurua, Joana M Fernandes, Eduardo Ferreira, Fabien Aubret, Tom Sarraude, Constant Perry, Lucas Wauters, Claudia Romeo, Martina Spada, Claudia Tranquillo, Alex O Sutton, Michael Griesser, Miyako H Warrington, Guillem Pérez I de Lanuza, Javier Abalos, Prem Aguilar, Ferran de la Cruz, Javier Juste, Pedro Alonso-Alonso, Jim Groombridge, Rebecca Louch, Kevin Ruhomaun, Sion Henshaw, Carlos Cabido, Ion Garin Barrio, Emina Šunje, Peter Hosner, Ivan Prates, Geoffrey M While, Roberto García-Roa, Tobias Uller, Nathalie Feiner, Elisa Bonaccorso, Pernille Klein-Ipsen, Rosalina Molberg Rotovnik, Antton Alberdi, Raphael Eisenhofer
{"title":"The Earth Hologenome Initiative: Data Release 1.","authors":"Nanna Gaun, Carlotta Pietroni, Garazi Martin-Bideguren, Jonas Lauritsen, Ostaizka Aizpurua, Joana M Fernandes, Eduardo Ferreira, Fabien Aubret, Tom Sarraude, Constant Perry, Lucas Wauters, Claudia Romeo, Martina Spada, Claudia Tranquillo, Alex O Sutton, Michael Griesser, Miyako H Warrington, Guillem Pérez I de Lanuza, Javier Abalos, Prem Aguilar, Ferran de la Cruz, Javier Juste, Pedro Alonso-Alonso, Jim Groombridge, Rebecca Louch, Kevin Ruhomaun, Sion Henshaw, Carlos Cabido, Ion Garin Barrio, Emina Šunje, Peter Hosner, Ivan Prates, Geoffrey M While, Roberto García-Roa, Tobias Uller, Nathalie Feiner, Elisa Bonaccorso, Pernille Klein-Ipsen, Rosalina Molberg Rotovnik, Antton Alberdi, Raphael Eisenhofer","doi":"10.1093/gigascience/giaf102","DOIUrl":"10.1093/gigascience/giaf102","url":null,"abstract":"<p><strong>Background: </strong>The Earth Hologenome Initiative (EHI) is a global endeavor dedicated to revisit fundamental ecological and evolutionary questions from the systemic host-microbiota perspective, through the standardized generation and analysis of joint animal genomic and associated microbial metagenomic data.</p><p><strong>Results: </strong>The first data release of the EHI contains 968 shotgun DNA sequencing read files containing 5.2 TB of raw genomic and metagenomic data derived from 21 vertebrate species sampled across 12 countries, as well as 17,666 metagenome-assembled genomes reconstructed from these data.</p><p><strong>Conclusions: </strong>The dataset can be used to address fundamental questions about host-microbiota interactions and will be available to the research community under the EHI data usage conditions.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145000348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf105
Larissa S Arantes, Tom Brown, Diego De Panis, Scott D Whiting, Erina J Young, Erin L LaCasella, Gabriella A Carvajal, Adam Kennedy, Deana Edmunds, Blair P Bentley, Jennifer Balacco, Conor Whelan, Nivesh Jain, Tatiana Tilley, Brian O'Toole, Patrick Traore, Erich D Jarvis, Oliver Berry, Peter H Dutton, Lisa M Komoroske, Camila J Mazzoni
{"title":"Haplotype-resolved reference genomes of the sea turtle clade unveil ultra-syntenic genomes with hotspots of divergence.","authors":"Larissa S Arantes, Tom Brown, Diego De Panis, Scott D Whiting, Erina J Young, Erin L LaCasella, Gabriella A Carvajal, Adam Kennedy, Deana Edmunds, Blair P Bentley, Jennifer Balacco, Conor Whelan, Nivesh Jain, Tatiana Tilley, Brian O'Toole, Patrick Traore, Erich D Jarvis, Oliver Berry, Peter H Dutton, Lisa M Komoroske, Camila J Mazzoni","doi":"10.1093/gigascience/giaf105","DOIUrl":"10.1093/gigascience/giaf105","url":null,"abstract":"<p><strong>Background: </strong>Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.</p><p><strong>Results: </strong>We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for 5 sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade's origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations and biogeographic contexts.</p><p><strong>Conclusions: </strong>Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection and likely will underpin sea turtles' responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The complete genome assembly of Astragalus membranaceus: enabling more accurate genetic research.","authors":"Huibin Qin, Aohui Li, Shuyu Zhong, Huazhi Wang, Hongling Tian","doi":"10.1093/gigascience/giaf117","DOIUrl":"10.1093/gigascience/giaf117","url":null,"abstract":"<p><strong>Background: </strong>Astragalus membranaceus (Fisch.) Bunge is a globally significant medicinal plant renowned for its potent immunomodulatory and antioxidant properties. However, the existing reference genome for this species remains incomplete, characterized by fragmented assemblies and the absence of centromeric and telomeric regions, thereby limiting comprehensive exploration of the genetic mechanisms underlying its key traits.</p><p><strong>Findings: </strong>We hereby present the first complete genome assembly for A. membranaceus (Fisch.) Bge \"AM-T2T,\" achieved through the integration of PacBio HiFi, ultra-long Oxford Nanopore Technologies, and Hi-C sequencing. The assembly achieved a total size of 1.39 Gb with an N50 of 180.45 Mb. The genome exhibits remarkable completeness (99.63% BUSCO completeness; long terminal repeat assembly index of 22.67) and high accuracy (quality value of 57.51; Genome Continuity Inspector score of 36.23). It features annotations of 64.22% repetitive sequences, 16 telomeres, 8 centromeres, 32,600 high-confident genes, 248 cytochrome P450 monooxygenases (CYP450s), and 163 uridine diphosphate glycosyltransferases. Notably, 158.58 Mb of previously unassembled regions were resolved, harboring 4 CYP450s. Additionally, 2,267 unique genes and 20,652 conserved genes were identified within the AM-T2T genome. Comparative analysis with Astragalus mongholicus assembly revealed 1,413 structural variations.</p><p><strong>Conclusions: </strong>This complete genome assembly of A. membranaceus represents a significant advancement in the genomic characterization of A. membranaceus, providing a robust resource that will bolster genetic research, breeding programs, and medicinal applications.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145198970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giaf017
Engy Nasr, Anna Henger, Björn Grüning, Paul Zierep, Bérénice Batut
{"title":"PathoGFAIR: a collection of FAIR and adaptable (meta)genomics workflows for (foodborne) pathogens detection and tracking.","authors":"Engy Nasr, Anna Henger, Björn Grüning, Paul Zierep, Bérénice Batut","doi":"10.1093/gigascience/giaf017","DOIUrl":"10.1093/gigascience/giaf017","url":null,"abstract":"<p><strong>Background: </strong>Food contamination by pathogens poses a global health threat, affecting an estimated 600 million people annually. During a foodborne outbreak investigation, microbiological analysis of food vehicles detects responsible pathogens and traces contamination sources. Metagenomic approaches offer a comprehensive view of the genomic composition of microbial communities, facilitating the detection of potential pathogens in samples. Combined with sequencing techniques like Oxford Nanopore sequencing, such metagenomic approaches become faster and easier to apply. A key limitation of these approaches is the lack of accessible, easy-to-use, and openly available pipelines for pathogen identification and tracking from (meta)genomic data.</p><p><strong>Findings: </strong>PathoGFAIR is a collection of Galaxy-based Findable, Accessible, Interoperable, and Reusable (FAIR) workflows employing state-of-the-art tools to detect and track pathogens from metagenomic Nanopore sequencing. Although initially developed to detect pathogens in food datasets, the workflows can be applied to other metagenomic Nanopore pathogenic data. PathoGFAIR incorporates visualizations and reports for comprehensive results. We tested PathoGFAIR on 130 samples containing different pathogens from multiple hosts under various experimental conditions. For all but 1 sample, workflows have successfully detected expected pathogens at least at the species rank. Further taxonomic ranks are detected for samples with sufficiently high colony-forming unit and low cycle threshold values.</p><p><strong>Conclusions: </strong>PathoGFAIR detects the pathogens at species and subspecies taxonomic ranks in all but 1 tested sample, regardless of whether the pathogen is isolated or the sample is incubated before sequencing. Importantly, PathoGFAIR is easy to use and can be straightforwardly adapted and extended for other types of analysis and sequencing techniques, making it usable in various pathogen detection scenarios.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12466118/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145148686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2025-01-06DOI: 10.1093/gigascience/giae119
Ty Easley, Xiaoke Luo, Kayla Hannon, Petra Lenzini, Janine Bijsterbosch
{"title":"Opaque ontology: neuroimaging classification of ICD-10 diagnostic groups in the UK Biobank.","authors":"Ty Easley, Xiaoke Luo, Kayla Hannon, Petra Lenzini, Janine Bijsterbosch","doi":"10.1093/gigascience/giae119","DOIUrl":"10.1093/gigascience/giae119","url":null,"abstract":"<p><strong>Background: </strong>The use of machine learning to classify diagnostic cases versus controls defined based on diagnostic ontologies such as the International Classification of Diseases, Tenth Revision (ICD-10) from neuroimaging features is now commonplace across a wide range of diagnostic fields. However, transdiagnostic comparisons of such classifications are lacking. Such transdiagnostic comparisons are important to establish the specificity of classification models, set benchmarks, and assess the value of diagnostic ontologies.</p><p><strong>Results: </strong>We investigated case-control classification accuracy in 17 different ICD-10 diagnostic groups from Chapter V (mental and behavioral disorders) and Chapter VI (diseases of the nervous system) using data from the UK Biobank. Classification models were trained using either neuroimaging (structural or functional brain magnetic resonance imaging feature sets) or sociodemographic features. Random forest classification models were adopted using rigorous shuffle-splits to estimate stability as well as accuracy of case-control classifications. Diagnostic classification accuracies were benchmarked against age classification (oldest vs. youngest) from the same feature sets and against additional classifier types (k-nearest neighbors and linear support vector machine). In contrast to age classification accuracy, which was high for all feature sets, few ICD-10 diagnostic groups were classified significantly above chance (namely, demyelinating diseases based on structural neuroimaging features and depression based on sociodemographic and functional neuroimaging features).</p><p><strong>Conclusion: </strong>These findings highlight challenges with the current disease classification system, leading us to recommend caution with the use of ICD-10 diagnostic groups as target labels in brain-based disease prediction studies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143390813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}