Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
MOSTPLAS: A Self-correction Multi-label Learning Model for Plasmid Host Range Prediction.
Bioinformatics (Oxford, England) Pub Date : 2025-02-17 DOI: 10.1093/bioinformatics/btaf075
Wei Zou, Yongxin Ji, Jiaojiao Guan, Yanni Sun
{"title":"MOSTPLAS: A Self-correction Multi-label Learning Model for Plasmid Host Range Prediction.","authors":"Wei Zou, Yongxin Ji, Jiaojiao Guan, Yanni Sun","doi":"10.1093/bioinformatics/btaf075","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf075","url":null,"abstract":"<p><strong>Motivation: </strong>Plasmids play an essential role in horizontal gene transfer, aiding their host bacteria in acquiring beneficial traits like antibiotic and metal resistance. There exists some plasmids that can transfer, replicate or persist in multiple organisms. Identifying the relatively complete host range of these plasmids provides insights into how plasmids promote bacterial evolution. To achieve this, we can apply multi-label learning models for plasmid host range prediction. However, there are no databases providing the detailed and complete host labels of these broad-host-range (BHR) plasmids. Without adequate well-annotated training samples, learning models can fail to extract discriminative feature representations for plasmid host prediction.</p><p><strong>Results: </strong>To address this problem, we propose a self-correction multi-label learning model called MOSTPLAS. We design a pseudo label learning algorithm and a self-correction asymmetric loss to facilitate the training of multi-label learning model with samples containing some unknown missing labels. We conducted a series of experiments on NCBI RefSeq plasmid database, PLSDB 2025 database, plasmids with experimentally determined host labels, Hi-C dataset and DoriC dataset. The benchmark results against other plasmid host range prediction tools demonstrated that MOSTPLAS recognized more host labels while keeping a high precision.</p><p><strong>Availability and implementation: </strong>MOSTPLAS is implemented with Python, which can be downloaded at https://github.com/wzou96/MOSTPLAS. All relevant data we used in the experiments can be found at 10.5281/zenodo.14708999.</p><p><strong>Contact and supplementary information: </strong>Please contact: yannisun@cityu.edu.hk. Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCLink: a graph contrastive link prediction framework for gene regulatory network inference.
Bioinformatics (Oxford, England) Pub Date : 2025-02-17 DOI: 10.1093/bioinformatics/btaf074
Weiming Yu, Zerun Lin, Miaofang Lan, Le Ou-Yang
{"title":"GCLink: a graph contrastive link prediction framework for gene regulatory network inference.","authors":"Weiming Yu, Zerun Lin, Miaofang Lan, Le Ou-Yang","doi":"10.1093/bioinformatics/btaf074","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf074","url":null,"abstract":"<p><strong>Motivation: </strong>Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance.</p><p><strong>Results: </strong>To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions.</p><p><strong>Availability: </strong>The source code and data are available at https://github.com/Yoyiming/GCLink.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TiltRec: An ultra-fast and open-source toolkit for cryo-electron tomographic reconstruction.
Bioinformatics (Oxford, England) Pub Date : 2025-02-14 DOI: 10.1093/bioinformatics/btaf068
Yanxin Jiao, Hongjia Li, Yang Xue, Guoliang Yang, Lei Qi, Fa Zhang, Dawei Zang, Renmin Han
{"title":"TiltRec: An ultra-fast and open-source toolkit for cryo-electron tomographic reconstruction.","authors":"Yanxin Jiao, Hongjia Li, Yang Xue, Guoliang Yang, Lei Qi, Fa Zhang, Dawei Zang, Renmin Han","doi":"10.1093/bioinformatics/btaf068","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf068","url":null,"abstract":"<p><strong>Background: </strong>Cryo-electron tomography (cryo-ET) has revolutionized our ability to observe structures from the subcellular to the atomic level in their native states. Achieving high-resolution reconstruction involves collecting tilt series at different angles and subsequently backprojecting them into three-dimensional (3D) space or iteratively reconstructing them to build a 3D volume of the specimen. However, the intricate computational demands of tomographic reconstruction pose significant challenges, requiring extensive calculation times that hinder efficiency, especially with large and complex datasets.</p><p><strong>Results: </strong>We present TiltRec, an open-source toolkit that leverages the parallel capabilities of CPUs and GPUs to enhance tomographic reconstruction. TiltRec implements six classical tomographic reconstruction algorithms, utilizing optimized parallel computation strategies and advanced memory management techniques. Performance evaluations across multiple datasets of varying sizes demonstrate that TiltRec significantly improves efficiency, reducing computational times while maintaining reconstruction resolution.</p><p><strong>Conclusions: </strong>TiltRec effectively addresses the computational challenges associated with cryo-ET reconstruction by fully exploiting parallel acceleration. As an open-source tool, TiltRec not only facilitates extensive applications by the research community but also supports further algorithm modifications and extensions, enabling the continued development of novel algorithms.</p><p><strong>Availability and implementation: </strong>The source code, documentation, and sample data can be downloaded at https://github.com/icthrm/TiltRec.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143416616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PNL: a software to build polygenic risk scores using a Super Learner approach based on PairNet, a Convolutional Neural Network.
Bioinformatics (Oxford, England) Pub Date : 2025-02-14 DOI: 10.1093/bioinformatics/btaf071
Ting-Huei Chen, Chia-Jung Lee, Syue-Pu Chen, Shang-Jung Wu, Cathy S J Fann
{"title":"PNL: a software to build polygenic risk scores using a Super Learner approach based on PairNet, a Convolutional Neural Network.","authors":"Ting-Huei Chen, Chia-Jung Lee, Syue-Pu Chen, Shang-Jung Wu, Cathy S J Fann","doi":"10.1093/bioinformatics/btaf071","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf071","url":null,"abstract":"<p><strong>Summary: </strong>Polygenic risk scores (PRS) hold promise for early disease diagnosis and personalized treatment, but their overall discriminative power remains limited for many diseases in the general population. As a result, numerous novel PRS modeling techniques have been developed to improve predictive performance, but determining the most effective method for a specific application remains uncertain until tested. Hence, we introduce a novel, versatile tool for building an optimized PRS model by integrating candidate models from multiple existing PRS building methods that use target population data and/or incorporating information from other populations through a trans-ethnic approach. Our tool, PNL is based on PairNet algorithm, a Convolutional Neural Network with low computation complexity through simple paring operation. In the case studies for asthma, type 2 diabetes, and vertigo, the optimal PRS model generated with PNL using only TWB data achieved AUCs that matched or improved the best results using other methods individually. Incorporating UKBB data further improved performance of PNL for asthma and type 2 diabetes. For vertigo, unlike the other diseases, individual method analysis showed that UKBB data alone generally produced lower AUCs compared to TWB data alone. As a result, incorporating UKBB data did not improve AUC with PNL, suggesting that increasing the number of candidate models does not necessarily result in higher AUC values, alleviating concerns about overfitting.</p><p><strong>Availability and implementation: </strong>The python code for PairNet algorithm incorporated in PNL is freely available on: https://github.com/FannLab/pairnet. An archived, citable version is stored on: https://doi.org/10.5281/zenodo.14838227.</p><p><strong>Contact: </strong>Correspondence should be addressed to corresponding authors.</p><p><strong>Supplementary information: </strong>Detailed implementation procedures can be found in the Supplementary Materials.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143416613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-cell copy number calling and event history reconstruction. 单细胞拷贝数调用和事件历史重建
Bioinformatics (Oxford, England) Pub Date : 2025-02-13 DOI: 10.1093/bioinformatics/btaf072
Jack Kuipers, Mustafa Anıl Tuncel, Pedro F Ferreira, Katharina Jahn, Niko Beerenwinkel
{"title":"Single-cell copy number calling and event history reconstruction.","authors":"Jack Kuipers, Mustafa Anıl Tuncel, Pedro F Ferreira, Katharina Jahn, Niko Beerenwinkel","doi":"10.1093/bioinformatics/btaf072","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf072","url":null,"abstract":"<p><strong>Motivation: </strong>Copy number alterations are driving forces of tumour development and the emergence of intra-tumour heterogeneity. A comprehensive picture of these genomic aberrations is therefore essential for the development of personalised and precise cancer diagnostics and therapies. Single-cell sequencing offers the highest resolution for copy number profiling down to the level of individual cells. Recent high-throughput protocols allow for the processing of hundreds of cells through shallow whole-genome DNA sequencing. The resulting low read-depth data poses substantial statistical and computational challenges to the identification of copy number alterations.</p><p><strong>Results: </strong>We developed SCICoNE, a statistical model and MCMC algorithm tailored to single-cell copy number profiling from shallow whole-genome DNA sequencing data. SCICoNE reconstructs the history of copy number events in the tumour and uses these evolutionary relationships to identify the copy number profiles of the individual cells. We show the accuracy of this approach in evaluations on simulated data and demonstrate its practicability in applications to two breast cancer samples from different sequencing protocols.</p><p><strong>Availability: </strong>SCICoNE is available at https://github.com/cbg-ethz/SCICoNE.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143411868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ELLIPSIS: Robust quantification of splicing in scRNA-seq.
Bioinformatics (Oxford, England) Pub Date : 2025-02-12 DOI: 10.1093/bioinformatics/btaf028
Marie Van Hecke, Niko Beerenwinkel, Thibault Lootens, Jan Fostier, Robrecht Raedt, Kathleen Marchal
{"title":"ELLIPSIS: Robust quantification of splicing in scRNA-seq.","authors":"Marie Van Hecke, Niko Beerenwinkel, Thibault Lootens, Jan Fostier, Robrecht Raedt, Kathleen Marchal","doi":"10.1093/bioinformatics/btaf028","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf028","url":null,"abstract":"<p><strong>Motivation: </strong>Alternative splicing is a tightly regulated biological process, that due to its cell type specific behaviour, calls for analysis at the single cell level. However, quantifying differential splicing in scRNA-seq is challenging due to low and uneven coverage. Hereto, we developed ELLIPSIS, a tool for robust quantification of splicing in scRNA-seq that leverages locally observed read coverage with conservation of flow and intra-cell type similarity properties. Additionally, it is also able to quantify splicing in novel splicing events, which is extremely important in cancer cells where lots of novel splicing events occur.</p><p><strong>Results: </strong>Application of ELLIPSIS to simulated data, proves that our method is able to robustly estimate Percent Spliced In values in simulated data, and allows to reliably detect differential splicing between cell types. Using ELLIPSIS on glioblastoma scRNA-seq data, we identified genes that are differentially spliced between cancer cells in the tumor core and infiltrating cancer cells found in peripheral tissue. These genes showed to play a role in a.o. cell migration and motility, cell projection organization and neuron projection guidance.</p><p><strong>Availability and implementation: </strong>ELLIPSIS quantification tool: https://github.com/MarchalLab/ELLIPSIS.git.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Associations on the Fly, a new feature aiming to facilitate exploration of the Open Targets Platform evidence.
Bioinformatics (Oxford, England) Pub Date : 2025-02-12 DOI: 10.1093/bioinformatics/btaf070
C Cruz-Castillo, L Fumis, C Mehta, R E Martinez-Osorio, J M Roldan-Romero, H Cornu, P Uniyal, A Solano-Roman, M Carmona, D Ochoa, E M McDonagh, A Buniello
{"title":"Associations on the Fly, a new feature aiming to facilitate exploration of the Open Targets Platform evidence.","authors":"C Cruz-Castillo, L Fumis, C Mehta, R E Martinez-Osorio, J M Roldan-Romero, H Cornu, P Uniyal, A Solano-Roman, M Carmona, D Ochoa, E M McDonagh, A Buniello","doi":"10.1093/bioinformatics/btaf070","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf070","url":null,"abstract":"<p><strong>Motivation: </strong>The Open Targets Platform (https://platform.opentargets.org) is a unique, comprehensive, open-source resource supporting systematic identification and prioritisation of targets for drug discovery. The Platform combines, harmonises and integrates data from >20 diverse sources to provide target-disease associations, covering evidence derived from genetic associations, somatic mutations, known drugs, differential expression, animal models, pathways and systems biology. An in-house target identification scoring framework weighs the evidence from each data source and type, contributing to an overall score for each of the 7.8M target-disease associations. However, the old infrastructure did not allow user-led dynamic adjustments in the contribution of different evidence types for target prioritisation, a limitation frequently raised by our user community. Furthermore, the previous Platform user interface did not support navigation and exploration of the underlying target-disease evidence on the same page, occasionally making the user journey counterintuitive.</p><p><strong>Results: </strong>Here, we describe \"Associations on the Fly\" (AOTF), a new Platform feature-developed with a user-centred vision-that enables the user to formulate more flexible therapeutic hypotheses through dynamic adjustment of the weight of contributing evidence from each source, altering the prioritisation of targets.</p><p><strong>Availability and implementation: </strong>The codebases that power the Platform-including our pipelines, GraphQL API, and React UI-are all open source and licensed under the APACHE LICENSE, VERSION 2.0.You can find all of our code repositories on GitHub at https://github.com/opentargets and on Zenodo at https://zenodo.org/records/14392214.This tool was implemented using React v18 and its code is accessible here: [https://github.com/opentargets/ot-ui-apps].The tools are accessible through the Open Targets Platform web interface [https://platform.opentargets.org/] and GraphQL API (https://platform-docs.opentargets.org/data-access/graphql-api).Data is available for download here: [https://platform.opentargets.org/downloads] and from the EMBL-EBI FTP: [https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/].</p><p><strong>Contact: </strong>Annalisa Buniello, European Molecular Biology Laboratory (EMBL-EBI), buniello@ebi.ac.uk.</p><p><strong>Supplementary information: </strong>Features walkthrough video: https://youtu.be/2A9bksboAag, https://www.youtube.com/watch?v=WQwQn6I4jkwExtensive documentation: https://platform-docs.opentargets.org/web-interface/associations-on-the-fly  https://platform-docs.opentargets.org/target-prioritisation.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Conditional Denoising VAE-based Framework for Antimicrobial Peptides Generation with Preserving Desirable Properties.
Bioinformatics (Oxford, England) Pub Date : 2025-02-11 DOI: 10.1093/bioinformatics/btaf069
Weizhong Zhao, Kaijieyi Hou, Yiting Shen, Xiaohua Hu
{"title":"A Conditional Denoising VAE-based Framework for Antimicrobial Peptides Generation with Preserving Desirable Properties.","authors":"Weizhong Zhao, Kaijieyi Hou, Yiting Shen, Xiaohua Hu","doi":"10.1093/bioinformatics/btaf069","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf069","url":null,"abstract":"<p><strong>Motivation: </strong>The widespread use of antibiotics has led to the emergence of resistant pathogens. Antimicrobial peptides (AMPs) combat bacterial infections by disrupting the integrity of cell membranes, making it challenging for bacteria to develop resistance. Consequently, AMPs offer a promising solution to addressing antibiotic resistance. However, the limited availability of natural AMPs cannot meet the growing demand. While deep learning technologies have advanced AMP generation, conventional models often lack stability and may introduce unforeseen side effects.</p><p><strong>Results: </strong>This study presents a novel denoising VAE-based model guided by desirable physicochemical properties for AMPs generation. The model integrates key features (e.g., molecular weight, isoelectric point, hydrophobicity, etc.), and employs position encoding along with a Transformer architecture to enhance generation accuracy. A customized loss function, combining reconstruction loss, KL divergence, and property preserving loss, ensures effective model training. Additionally, the model incorporates a denoising mechanism, enabling it to learn from perturbed inputs, thus maintaining performance under limited training data. Experimental results demonstrate that the proposed model can generate AMPs with desirable functional properties, offering a viable approach for AMP design and analysis, which ultimately contributes to the fight against antibiotic resistance.</p><p><strong>Availability and implementation: </strong>The data and source codes are available both in GitHub (https://github.com/David-WZhao/PPGC-DVAE) and Zenodo (DOI 10.5281/zenodo.14730711).</p><p><strong>Contact and supplementary information: </strong>wzzhao@ccnu.edu.cn, and Supplementary materials are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model.
Bioinformatics (Oxford, England) Pub Date : 2025-02-11 DOI: 10.1093/bioinformatics/btaf059
Haoyu Cui, Qinhao Guo, Jun Xu, Xiaohua Wu, Chengfei Cai, Yiping Jiao, Wenlong Ming, Hao Wen, Xiangxue Wang
{"title":"Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model.","authors":"Haoyu Cui, Qinhao Guo, Jun Xu, Xiaohua Wu, Chengfei Cai, Yiping Jiao, Wenlong Ming, Hao Wen, Xiangxue Wang","doi":"10.1093/bioinformatics/btaf059","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf059","url":null,"abstract":"<p><strong>Motivation: </strong>Endometrial cancer is a prevalent gynecological malignancy that requires accurate identification of its molecular subtypes for effective diagnosis and treatment. Four molecular subtypes with different clinical outcomes have been identified: POLE mutation, mismatch repair deficient, p53 abnormal, and no specific molecular profile. However, determining these subtypes typically relies on expensive gene sequencing. To overcome this limitation, we propose a novel method that utilizes hematoxylin and eosin-stained whole slide images to predict endometrial cancer molecular subtypes.</p><p><strong>Results: </strong>Our approach leverages a hierarchical foundation model as a backbone, fine-tuned from the UNI computational pathology foundation model, to extract tissue embedding from different scales. We have achieved promising results through extensive experimentation on the Fudan University Shanghai Cancer Center cohort (N = 364). Our model demonstrates a macro-average AUROC of 0.879 (95% CI, 0.853-0.904) in a 5-fold cross-validation. Compared to the current state-of-the-art molecular subtypes prediction for endometrial cancer, our method outperforms in terms of predictive accuracy and computational efficiency. Moreover, our method is highly reproducible, allowing for ease of implementation and widespread adoption. This study aims to address the cost and time constraints associated with traditional gene sequencing techniques. By providing a reliable and accessible alternative to gene sequencing, our method has the potential to revolutionize the field of endometrial cancer diagnosis and improve patient outcomes.</p><p><strong>Availability: </strong>The codes and data used for generating results in this study are available at https://github.com/HaoyuCui/hi-UNI for GitHub and https://doi.org/10.5281/zenodo.14627478 for Zenodo.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient storage and regression computation for population-scale genome sequencing studies.
Bioinformatics (Oxford, England) Pub Date : 2025-02-11 DOI: 10.1093/bioinformatics/btaf067
Manuel A Rivas, Christopher Chang
{"title":"Efficient storage and regression computation for population-scale genome sequencing studies.","authors":"Manuel A Rivas, Christopher Chang","doi":"10.1093/bioinformatics/btaf067","DOIUrl":"10.1093/bioinformatics/btaf067","url":null,"abstract":"<p><strong>Motivation: </strong>The growing availability of large-scale population biobanks has the potential to significantly advance our understanding of human health and disease. However, the massive computational and storage demands of whole genome sequencing (WGS) data pose serious challenges, particularly for underfunded institutions or researchers in developing countries. This disparity in resources can limit equitable access to cutting-edge genetic research.</p><p><strong>Results: </strong>We present novel algorithms and regression methods that dramatically reduce both computation time and storage requirements for WGS studies, with particular attention to rare variant representation. By integrating these approaches into PLINK 2.0, we demonstrate substantial gains in efficiency without compromising analytical accuracy. In an exome-wide association analysis of 19.4 million variants for the body mass index phenotype in 125,077 individuals (AllofUs project data), we reduced runtime from 695.35 minutes (11.5 hours) on a single machine to 1.57 minutes with 30 GB of memory and 50 threads (or 8.67 minutes with 4 threads). Additionally, the framework supports multi-phenotype analyses, further enhancing its flexibility.</p><p><strong>Availability: </strong>Our optimized methods are fully integrated into PLINK 2.0 and can be accessed at: https://www.cog-genomics.org/plink/2.0/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信