GigaScience最新文献

筛选
英文 中文
Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering. 基于知识图的思想:面向泛癌症问答的知识图增强LLM框架。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae082
Yichun Feng, Lu Zhou, Chao Ma, Yikai Zheng, Ruikun He, Yixue Li
{"title":"Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering.","authors":"Yichun Feng, Lu Zhou, Chao Ma, Yikai Zheng, Ruikun He, Yixue Li","doi":"10.1093/gigascience/giae082","DOIUrl":"10.1093/gigascience/giae082","url":null,"abstract":"<p><strong>Background: </strong>In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.</p><p><strong>Results: </strong>We developed the knowledge graph-based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.</p><p><strong>Conclusions: </strong>The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characteristics and filtering of low-frequency artificial short deletion variations based on nanopore sequencing. 基于纳米孔测序的低频人工短缺失变异特征及筛选。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf018
Fuqiang Ye, Juanjuan Zhu, Xiaomin Zhang, Jiarong Zhang, Zihan Xie, Tingting Yang, Yifang Han, Xiaohong Yang, Zilin Ren, Ming Ni
{"title":"Characteristics and filtering of low-frequency artificial short deletion variations based on nanopore sequencing.","authors":"Fuqiang Ye, Juanjuan Zhu, Xiaomin Zhang, Jiarong Zhang, Zihan Xie, Tingting Yang, Yifang Han, Xiaohong Yang, Zilin Ren, Ming Ni","doi":"10.1093/gigascience/giaf018","DOIUrl":"10.1093/gigascience/giaf018","url":null,"abstract":"<p><strong>Background: </strong>Nanopore sequencing is characterized by high portability and long reads, albeit accompanied by systematic errors causing short deletions. Few tools can filter low-frequency artificial deletions, especially in single samples.</p><p><strong>Results: </strong>To solve this problem, we first synthesized or purchased 17 DNA/RNA standards for nanopore sequencing with R9 and R10 flowcells to obtain benchmarking datasets. False-positive (FP) deletions were prevalent (75.86%-96.26%), while the majority (62.07%-79.68%) were located in homopolymeric regions. The 10-mer base-quality scores (Q scores) and sequencing speeds flanking the FP homopolymeric deletions marginally differed from the true-positive (TP) deletions. We thus investigated the raw current signals after normalizing them by length. We found more significant differences in current signals between the reads with and without FP deletions. Indexes including the MRPP A (Multiple Response Permutation Procedure, statistic A), the accumulative difference of normalized current signals, and the Q score were tested for the power of distinguishing between FP and TP deletions. MRPP A outperformed the other indexes in homopolymeric regions and achieved the highest accuracy of 76.73% for challenging 1-base homopolymeric deletions. When sequencing depth was low, the Q score performed better than MRPP A. We developed Delter (Deletion filter) to filter low-frequency FP deletions of nanopore sequencing in single samples, which removed 60.98% to 100% of artificial homopolymeric deletions in real samples.</p><p><strong>Conclusions: </strong>Low-frequency artificial short deletion variations, especially the most challenging homopolymeric deletions, could be effectively filtered by Delter using normalized current signals or Q scores according to the employed sequencing strategies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A telomere-to-telomere phased genome of an octoploid strawberry reveals a receptor kinase conferring anthracnose resistance. 八倍体草莓的端粒到端粒相基因组揭示了一种赋予炭疽病抗性的受体激酶。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf005
Hyeondae Han, Natalia Salinas, Christopher R Barbey, Yoon Jeong Jang, Zhen Fan, Sujeet Verma, Vance M Whitaker, Seonghee Lee
{"title":"A telomere-to-telomere phased genome of an octoploid strawberry reveals a receptor kinase conferring anthracnose resistance.","authors":"Hyeondae Han, Natalia Salinas, Christopher R Barbey, Yoon Jeong Jang, Zhen Fan, Sujeet Verma, Vance M Whitaker, Seonghee Lee","doi":"10.1093/gigascience/giaf005","DOIUrl":"10.1093/gigascience/giaf005","url":null,"abstract":"<p><strong>Background: </strong>Cultivated strawberry (Fragaria xananassa Duch.), an allo-octoploid species arising from at least 3 diploid progenitors, poses a challenge for genomic analysis due to its high levels of heterozygosity and the complex nature of its polyploid genome.</p><p><strong>Results: </strong>This study developed the complete haplotype-phased genome sequence from a short-day strawberry, 'Florida Brilliance' without parental data, assembling 56 chromosomes from telomere to telomere. This assembly was achieved with high-fidelity long reads and high-throughput chromatic capture sequencing (Hi-C). The centromere core regions and 96,104 genes were annotated using long-read isoform RNA sequencing. Using the high quality of the haplotype-phased reference genome, FaFB1, we identified the causal mutation within the gene encoding Leaf Rust 10 Disease-Resistance Locus Receptor-like Protein Kinase (LRK10) that confers resistance to anthracnose fruit rot (AFR). This disease is caused by the Colletotrichum acutatum species complex and results in significant economic losses in strawberry production. Comparison of resistant and susceptible haplotype assemblies and full-length transcript data revealed a 29-bp insertion at the first exon of the susceptible allele, leading to a premature stop codon and loss of gene function. The functional role of LRK10 in resistance to AFR was validated using a simplified Agrobacterium-based transformation method for transient gene expression analysis in strawberry fruits. Transient knockdown and overexpression of LRK10 in fruit indicate a key role for LRK10 in AFR resistance in strawberry.</p><p><strong>Conclusions: </strong>The FaFB1 assembly along with other resources will be valuable for the discovery of additional candidate genes associated with disease resistance and fruit quality, which will not only advance our understanding of genes and their functions but also facilitate advancements in genome editing in strawberry.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899574/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143614573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Healthy microbiome-moving towards functional interpretation. 健康微生物群-向功能解释迈进。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf015
Kinga Zielińska, Klas I Udekwu, Witold Rudnicki, Alina Frolova, Paweł P Łabaj
{"title":"Healthy microbiome-moving towards functional interpretation.","authors":"Kinga Zielińska, Klas I Udekwu, Witold Rudnicki, Alina Frolova, Paweł P Łabaj","doi":"10.1093/gigascience/giaf015","DOIUrl":"10.1093/gigascience/giaf015","url":null,"abstract":"<p><strong>Background: </strong>Microbiome-based disease prediction has significant potential as an early, noninvasive marker of multiple health conditions linked to dysbiosis of the human gut microbiota, thanks in part to decreasing sequencing and analysis costs. Microbiome health indices and other computational tools currently proposed in the field often are based on a microbiome's species richness and are completely reliant on taxonomic classification. A resurgent interest in a metabolism-centric, ecological approach has led to an increased understanding of microbiome metabolic and phenotypic complexity, revealing substantial restrictions of taxonomy-reliant approaches.</p><p><strong>Findings: </strong>In this study, we introduce a new metagenomic health index developed as an answer to recent developments in microbiome definitions, in an effort to distinguish between healthy and unhealthy microbiomes, here in focus, inflammatory bowel disease (IBD). The novelty of our approach is a shift from a traditional Linnean phylogenetic classification toward a more holistic consideration of the metabolic functional potential underlining ecological interactions between species. Based on well-explored data cohorts, we compare our method and its performance with the most comprehensive indices to date, the taxonomy-based Gut Microbiome Health Index (GMHI), and the high-dimensional principal component analysis (hiPCA) methods, as well as to the standard taxon- and function-based Shannon entropy scoring. After demonstrating better performance on the initially targeted IBD cohorts, in comparison with other methods, we retrain our index on an additional 27 datasets obtained from different clinical conditions and validate our index's ability to distinguish between healthy and disease states using a variety of complementary benchmarking approaches. Finally, we demonstrate its superiority over the GMHI and the hiPCA on a longitudinal COVID-19 cohort and highlight the distinct robustness of our method to sequencing depth.</p><p><strong>Conclusions: </strong>Overall, we emphasize the potential of this metagenomic approach and advocate a shift toward functional approaches to better understand and assess microbiome health as well as provide directions for future index enhancements. Our method, q2-predict-dysbiosis (Q2PD), is freely available (https://github.com/Kizielins/q2-predict-dysbiosis).</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927397/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeneSetCart: assembling, augmenting, combining, visualizing, and analyzing gene sets. GeneSetCart:组装,增强,结合,可视化,并分析基因集。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf025
Giacomo B Marino, Stephanie Olaiya, John Erol Evangelista, Daniel J B Clarke, Avi Ma'ayan
{"title":"GeneSetCart: assembling, augmenting, combining, visualizing, and analyzing gene sets.","authors":"Giacomo B Marino, Stephanie Olaiya, John Erol Evangelista, Daniel J B Clarke, Avi Ma'ayan","doi":"10.1093/gigascience/giaf025","DOIUrl":"https://doi.org/10.1093/gigascience/giaf025","url":null,"abstract":"<p><p>Converting multiomics datasets into gene sets facilitates data integration that leads to knowledge discovery. Although there are tools developed to analyze gene sets, only a few offer the management of gene sets from multiple sources. GeneSetCart is an interactive web-based platform that enables investigators to gather gene sets from various sources; augment these sets with gene-gene coexpression correlations and protein-protein interactions; perform set operations on these sets such as union, consensus, and intersection; and visualize and analyze these gene sets, all in one place. GeneSetCart supports the upload of single or multiple gene sets, as well as fetching gene sets by searching PubMed for genes comentioned with terms in publications. Venn diagrams, heatmaps, Uniform Manifold Approximation and Projection (UMAP) plots, SuperVenn diagrams, and UpSet plots can visualize the gene sets in a GeneSetCart session to summarize the similarity and overlap among the sets. Users of GeneSetCart can also perform enrichment analysis on their assembled gene sets with external tools. All gene sets in a session can be saved to a user account for reanalysis and sharing with collaborators. GeneSetCart has a gene set library crossing feature that enables analysis of gene sets created from several National Institutes of Health Common Fund programs. For the top overlapping sets from pairs of programs, a large language model (LLM) is prompted to propose possible reasons for the high overlap. Using this feature, two use cases are presented. In addition, users of GeneSetCart can produce publication-ready reports from their uploaded sets. Text in these reports is also supplemented with an LLM. Overall, GeneSetCart is a useful resource enabling biologists without programming expertise to facilitate data integration for hypothesis generation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11984350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143975144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications. 联邦知识图上的生物信息学问题-查询对的大集合:方法和应用。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf045
Jerven Bolleman, Vincent Emonet, Adrian Altenhoff, Amos Bairoch, Marie-Claude Blatter, Alan Bridge, Séverine Duvaud, Elisabeth Gasteiger, Dmitry Kuznetsov, Sébastien Moretti, Pierre-Andre Michel, Anne Morgat, Marco Pagni, Nicole Redaschi, Monique Zahn-Zabal, Tarcisio Mendes de Farias, Ana Claudia Sima
{"title":"A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications.","authors":"Jerven Bolleman, Vincent Emonet, Adrian Altenhoff, Amos Bairoch, Marie-Claude Blatter, Alan Bridge, Séverine Duvaud, Elisabeth Gasteiger, Dmitry Kuznetsov, Sébastien Moretti, Pierre-Andre Michel, Anne Morgat, Marco Pagni, Nicole Redaschi, Monique Zahn-Zabal, Tarcisio Mendes de Farias, Ana Claudia Sima","doi":"10.1093/gigascience/giaf045","DOIUrl":"10.1093/gigascience/giaf045","url":null,"abstract":"<p><strong>Background: </strong>In recent decades, several life science resources have structured data using the same framework and made these accessible using the same query language to facilitate interoperability. Knowledge graphs have seen increased adoption in bioinformatics due to their advantages for representing data in a generic graph format. For example, yummydata.org catalogs more than 60 knowledge graphs accessible through SPARQL, a technical query language. Although SPARQL allows powerful, expressive queries, even across physically distributed knowledge graphs, formulating such queries is a challenge for most users. Therefore, to guide users in retrieving the relevant data, many of these resources provide representative examples. These examples can also be an important source of information for machine learning (for example, machine-learning algorithms for translating natural language questions to SPARQL), if a sufficiently large number of examples are provided and published in a common, machine-readable, and standardized format across different resources.</p><p><strong>Findings: </strong>We introduce a large collection of human-written natural language questions and their corresponding SPARQL queries over federated bioinformatics knowledge graphs (KGs) collected for several years across different research groups at the SIB Swiss Institute of Bioinformatics. The collection comprises more than 1,000 example questions and queries, including almost 100 federated queries. We propose a methodology to uniformly represent the examples with minimal metadata, based on existing standards. Furthermore, we introduce an extensive set of open-source applications, including query graph visualizations and smart query editors, easily reusable by KG maintainers who adopt the proposed methodology.</p><p><strong>Conclusions: </strong>We encourage the community to adopt and extend the proposed methodology, towards richer KG metadata and improved Semantic Web services. URL:  https://github.com/sib-swiss/sparql-examples.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144077456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
External validation of machine learning models-registered models and adaptive sample splitting. 机器学习模型的外部验证——注册模型和自适应样本分割。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf036
Giuseppe Gallitto, Robert Englert, Balint Kincses, Raviteja Kotikalapudi, Jialin Li, Kevin Hoffschlag, Ulrike Bingel, Tamas Spisak
{"title":"External validation of machine learning models-registered models and adaptive sample splitting.","authors":"Giuseppe Gallitto, Robert Englert, Balint Kincses, Raviteja Kotikalapudi, Jialin Li, Kevin Hoffschlag, Ulrike Bingel, Tamas Spisak","doi":"10.1093/gigascience/giaf036","DOIUrl":"https://doi.org/10.1093/gigascience/giaf036","url":null,"abstract":"<p><strong>Background: </strong>Multivariate predictive models play a crucial role in enhancing our understanding of complex biological systems and in developing innovative, replicable tools for translational medical research. However, the complexity of machine learning methods and extensive data preprocessing and feature engineering pipelines can lead to overfitting and poor generalizability. An unbiased evaluation of predictive models necessitates external validation, which involves testing the finalized model on independent data. Despite its importance, external validation is often neglected in practice due to the associated costs.</p><p><strong>Results: </strong>Here we propose that, for maximal credibility, model discovery and external validation should be separated by the public disclosure (e.g., preregistration) of feature processing steps and model weights. Furthermore, we introduce a novel approach to optimize the trade-off between efforts spent on model discovery and external validation in such studies. We show on data involving more than 3,000 participants from four different datasets that, for any \"sample size budget,\" the proposed adaptive splitting approach can successfully identify the optimal time to stop model discovery so that predictive performance is maximized without risking a low-powered, and thus inconclusive, external validation.</p><p><strong>Conclusion: </strong>The proposed design and splitting approach (implemented in the Python package \"AdaptiveSplit\") may contribute to addressing issues of replicability, effect size inflation, and generalizability in predictive modeling studies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077397/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144077476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Observational, causal relationship and shared genetic basis between cholelithiasis and gastroesophageal reflux disease: evidence from a cohort study and comprehensive genetic analysis. 胆石症和胃食管反流病之间的观察、因果关系和共同的遗传基础:来自队列研究和综合遗传分析的证据
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf023
Yanlin Lyu, Shuangshuang Tong, Wentao Huang, Yuying Ma, Ruijie Zeng, Rui Jiang, Ruibang Luo, Felix W Leung, Qizhou Lian, Weihong Sha, Hao Chen
{"title":"Observational, causal relationship and shared genetic basis between cholelithiasis and gastroesophageal reflux disease: evidence from a cohort study and comprehensive genetic analysis.","authors":"Yanlin Lyu, Shuangshuang Tong, Wentao Huang, Yuying Ma, Ruijie Zeng, Rui Jiang, Ruibang Luo, Felix W Leung, Qizhou Lian, Weihong Sha, Hao Chen","doi":"10.1093/gigascience/giaf023","DOIUrl":"10.1093/gigascience/giaf023","url":null,"abstract":"<p><strong>Objective: </strong>Cholelithiasis and gastroesophageal reflux disease (GERD) contribute to significant health concerns. We aimed to investigate the potential observational, causal, and genetic relationships between cholelithiasis and GERD.</p><p><strong>Design: </strong>The observational correlations were assessed based on the prospective cohort study from UK Biobank. Then, by leveraging the genome-wide summary statistics of cholelithiasis (N = 334,277) and GERD (N = 332,601), the bidirectional causal associations were evaluated using Mendelian randomization (MR) analysis. Subsequently, a series of genetic analyses was used to assess the genetic correlation, shared loci, and genes between cholelithiasis and GERD.</p><p><strong>Results: </strong>The prospective cohort analyses revealed a significantly increased risk of GERD in individuals with cholelithiasis (hazard ratio [HR] = 1.99; 95% confidence interval [CI], 1.89-2.10) and a higher risk of cholelithiasis among patients with GERD (HR = 2.30; 95% CI, 2.18-2.44). The MR study indicated the causal effect of genetic liability to cholelithiasis on the incidence of GERD (odds ratio [OR] = 1.08; 95% CI, 1.05-1.11) and the causal effect of genetic predicted GERD on cholelithiasis (OR = 1.15; 95% CI, 1.02-1.31). In addition, cholelithiasis and GERD exhibited a strong genetic association. Cross-trait meta-analyses identified 5 novel independent loci shared between cholelithiasis and GERD. Three shared genes, including SUN2, CBY1, and JOSD1, were further identified as novel risk genes.</p><p><strong>Conclusion: </strong>The elucidation of the shared genetic basis underlying the phenotypic relationship of these 2 complex phenotypes offers new insights into the intrinsic linkage between cholelithiasis and GERD, providing a novel research direction for future therapeutic strategy and risk prediction.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11943489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143729537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel dataset for nuclei and tissue segmentation in melanoma with baseline nuclei segmentation and tissue segmentation benchmarks. 基于基线核分割和组织分割基准的黑色素瘤核和组织分割的新数据集。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf011
Mark Schuiveling, Hong Liu, Daniel Eek, Gerben E Breimer, Karijn P M Suijkerbuijk, Willeke A M Blokx, Mitko Veta
{"title":"A novel dataset for nuclei and tissue segmentation in melanoma with baseline nuclei segmentation and tissue segmentation benchmarks.","authors":"Mark Schuiveling, Hong Liu, Daniel Eek, Gerben E Breimer, Karijn P M Suijkerbuijk, Willeke A M Blokx, Mitko Veta","doi":"10.1093/gigascience/giaf011","DOIUrl":"10.1093/gigascience/giaf011","url":null,"abstract":"<p><strong>Background: </strong>Melanoma is an aggressive form of skin cancer in which tumor-infiltrating lymphocytes (TILs) are a biomarker for recurrence and treatment response. Manual TIL assessment is prone to interobserver variability, and current deep learning models are not publicly accessible or have low performance. Deep learning models, however, have the potential of consistent spatial evaluation of TILs and other immune cell subsets with the potential of improved prognostic and predictive value. To make the development of these models possible, we created the Panoptic Segmentation of nUclei and tissue in advanced MelanomA (PUMA) dataset and assessed the performance of several state-of-the-art deep learning models. In addition, we show how to improve model performance further by using heuristic postprocessing in which nuclei classes are updated based on their tissue localization.</p><p><strong>Results: </strong>The PUMA dataset includes 155 primary and 155 metastatic melanoma hematoxylin and eosin-stained regions of interest with nuclei and tissue annotations from a single melanoma referral institution. The Hover-NeXt model, trained on the PUMA dataset, demonstrated the best performance for lymphocyte detection, approaching human interobserver agreement. In addition, heuristic postprocessing of deep learning models improved the detection of noncommon classes, such as epithelial nuclei.</p><p><strong>Conclusion: </strong>The PUMA dataset is the first melanoma-specific dataset that can be used to develop melanoma-specific nuclei and tissue segmentation models. These models can, in turn, be used for prognostic and predictive biomarker development. Incorporating tissue and nuclei segmentation is a step toward improved deep learning nuclei segmentation performance. To support the development of these models, this dataset is used in the PUMA challenge.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143457766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Opaque ontology: neuroimaging classification of ICD-10 diagnostic groups in the UK Biobank. 不透明本体:英国生物银行ICD-10诊断组的神经影像学分类。
IF 11.8 2区 生物学
GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae119
Ty Easley, Xiaoke Luo, Kayla Hannon, Petra Lenzini, Janine Bijsterbosch
{"title":"Opaque ontology: neuroimaging classification of ICD-10 diagnostic groups in the UK Biobank.","authors":"Ty Easley, Xiaoke Luo, Kayla Hannon, Petra Lenzini, Janine Bijsterbosch","doi":"10.1093/gigascience/giae119","DOIUrl":"10.1093/gigascience/giae119","url":null,"abstract":"<p><strong>Background: </strong>The use of machine learning to classify diagnostic cases versus controls defined based on diagnostic ontologies such as the International Classification of Diseases, Tenth Revision (ICD-10) from neuroimaging features is now commonplace across a wide range of diagnostic fields. However, transdiagnostic comparisons of such classifications are lacking. Such transdiagnostic comparisons are important to establish the specificity of classification models, set benchmarks, and assess the value of diagnostic ontologies.</p><p><strong>Results: </strong>We investigated case-control classification accuracy in 17 different ICD-10 diagnostic groups from Chapter V (mental and behavioral disorders) and Chapter VI (diseases of the nervous system) using data from the UK Biobank. Classification models were trained using either neuroimaging (structural or functional brain magnetic resonance imaging feature sets) or sociodemographic features. Random forest classification models were adopted using rigorous shuffle-splits to estimate stability as well as accuracy of case-control classifications. Diagnostic classification accuracies were benchmarked against age classification (oldest vs. youngest) from the same feature sets and against additional classifier types (k-nearest neighbors and linear support vector machine). In contrast to age classification accuracy, which was high for all feature sets, few ICD-10 diagnostic groups were classified significantly above chance (namely, demyelinating diseases based on structural neuroimaging features and depression based on sociodemographic and functional neuroimaging features).</p><p><strong>Conclusion: </strong>These findings highlight challenges with the current disease classification system, leading us to recommend caution with the use of ICD-10 diagnostic groups as target labels in brain-based disease prediction studies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143390813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信