Bioinformatics (Oxford, England)最新文献_第2页

Warp analysis research pipelines: cloud-optimized workflows for biological data processing and reproducible analysis. 翘曲分析研究管道：用于生物数据处理和可重复分析的云优化工作流程。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf494

Kylee Degatano, Aseel Awdeh, Robert Sidney Cox, Wes Dingman, George Grant, Farzaneh Khajouei, Elizabeth Kiernan, Kishori Konwar, Kaylee L Mathews, Kevin Palis, Nikelle Petrillo, Geraldine Van der Auwera, Chengchen Rex Wang, Jessica Way

{"title":"Warp analysis research pipelines: cloud-optimized workflows for biological data processing and reproducible analysis.","authors":"Kylee Degatano, Aseel Awdeh, Robert Sidney Cox, Wes Dingman, George Grant, Farzaneh Khajouei, Elizabeth Kiernan, Kishori Konwar, Kaylee L Mathews, Kevin Palis, Nikelle Petrillo, Geraldine Van der Auwera, Chengchen Rex Wang, Jessica Way","doi":"10.1093/bioinformatics/btaf494","DOIUrl":"10.1093/bioinformatics/btaf494","url":null,"abstract":"Summary: In the era of large data, the cloud is increasingly used as a computing environment, necessitating the development of cloud-compatible pipelines that can provide uniform analysis across disparate biological datasets. The Warp Analysis Research Pipelines (WARP) repository is a GitHub repository of open-source, cloud-optimized workflows for biological data processing that are semantically versioned, tested, and documented. A companion repository, WARP-Tools, hosts Docker containers and custom tools used in WARP workflows.Availability and implementation: The WARP and WARP-Tools repositories and code are freely available at https://github.com/broadinstitute/WARP and https://github.com/broadinstitute/WARP-tools, respectively. The pipelines are available for download from the WARP repository, can be exported from Dockstore, and can be imported to a bioinformatics platform such as Terra.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting the trend of SARS-CoV-2 mutation frequencies using historical data. 利用历史数据预测SARS-CoV-2突变频率趋势

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf508

Xinyu Zhou, Yi Yan, Kevin Hu, Haixu Tang, Yijie Wang, Lu Wang, Chi Zhang, Sha Cao

{"title":"Predicting the trend of SARS-CoV-2 mutation frequencies using historical data.","authors":"Xinyu Zhou, Yi Yan, Kevin Hu, Haixu Tang, Yijie Wang, Lu Wang, Chi Zhang, Sha Cao","doi":"10.1093/bioinformatics/btaf508","DOIUrl":"10.1093/bioinformatics/btaf508","url":null,"abstract":"Motivation: As the SARS-CoV-2 virus rapidly evolves, predicting the trajectory of viral mutations has become a critical yet complex task. A deep understanding of future mutation patterns, in particular the mutations that will prevail in the near future, is vital in steering diagnostics, therapeutics, and vaccine strategies for disease control.Results: In this study, we developed a model to forecast future SARS-CoV-2 mutation surges in real-time, using historical mutation frequency data from the USA. We transformed the temporal prediction problem into a supervised learning framework using a sliding window approach. This involved breaking the time series of mutation frequencies into very short segments. Considering the time-dependent nature of the data, we focused on modeling the first-order derivative of the mutation frequency. We predicted the final derivative in each segment based on the preceding derivatives, employing various machine learning methods, including random forest, XGBoost, support vector machine, and neural network models. Empowered by the novel transformation strategy and the high capacity of machine learning models, we observed low prediction error that is confined within 0.1% and 1% when making predictions of mutation rates for the future 30 and 80 days, respectively. In addition, the method also led to a notable increase in prediction accuracy compared to traditional time-series models, as evidenced by much lower MAE (Mean Absolute Error) and MSE (Mean Squared Error) for predictions made within different time horizons. To further assess the method's effectiveness and robustness in predicting mutation patterns for unforeseen mutations, we first designed a synthetic case where we categorized all mutations into three major patterns. The model demonstrated its robustness by accurately predicting unseen mutation patterns when training on data from two pattern categories while testing on the third pattern category, showcasing its potential in forecasting a variety of mutation trajectories. We then applied our method to prediction for a recent time frame between 1 January 2025 and 10 June 2025, for both the USA and UK, where the model training was conducted using frequency sequence data collected between 12 December 2019 and 26 January 2023 in the USA. The model demonstrated superior performance for both datasets.Availability and implementation: To enhance accessibility and utility, we built our methodology into a GitHub package (https://github.com/ZhouXY199502/SWD). Our method has the potential applicability to study other infectious diseases or forecasting tasks, thus extending its relevance beyond the current COVID pandemic.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An innovative peptide toxicity prediction model based on multi-scale convolutional neural network and residual connection. 基于多尺度卷积神经网络和残差连接的多肽毒性预测模型。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf462

Shengli Zhang, Jingyi Ren, Yunyun Liang

{"title":"An innovative peptide toxicity prediction model based on multi-scale convolutional neural network and residual connection.","authors":"Shengli Zhang, Jingyi Ren, Yunyun Liang","doi":"10.1093/bioinformatics/btaf462","DOIUrl":"10.1093/bioinformatics/btaf462","url":null,"abstract":"Motivation: Peptide toxicity is a critical concern in the development of peptide-based therapeutics, as toxic peptides can lead to severe side effects, including organ damage, immune reactions, and cytotoxicity. Predicting peptide toxicity accurately is essential to ensure the safety and efficacy of these drugs.Results: In this study, we propose a novel model, ToxMSRC, to predict peptide toxicity using a combination of the continuous bag of words (CBOW) method from word2vec, synthetic minority over-sampling technique (SMOTE), multi-scale convolutional neural networks (CNN), and bidirectional long short-term memory (BiLSTM). This approach addresses the challenge of data imbalance by augmenting positive samples and improves feature extraction through multi-scale convolution. Furthermore, the model incorporates a residual connection that helps prevent overfitting and enhances generalization ability, improving classification performance. The model is evaluated on benchmark and independent test sets, achieving BACC scores of 92.17% on independent test1 and 86.89% on independent test2, outperforming existing state-of-the-art models. Additionally, ToxMSRC provides valuable insights into the relationship between peptide toxicity and amino acid sequences, demonstrating its potential and practical value in peptide-based drug development.Availability and implementation: The complete datasets, source code, and pre-trained models are made available at https://github.com/Renjingyi123/ToxMSRC and https://doi.org/10.5281/zenodo.15668530.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An image-based protein-ligand binding representation learning framework via multi-level flexible dynamics trajectory pre-training. 基于多层次柔性动力学轨迹预训练的图像蛋白质-配体结合表征学习框架。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf535

Hongxin Xiang, Mingquan Liu, Linlin Hou, Shuting Jin, Jianmin Wang, Jun Xia, Wenjie Du, Sisi Yuan, Xiangzheng Fu, Xinyu Yang, Li Zeng, Lei Xu

{"title":"An image-based protein-ligand binding representation learning framework via multi-level flexible dynamics trajectory pre-training.","authors":"Hongxin Xiang, Mingquan Liu, Linlin Hou, Shuting Jin, Jianmin Wang, Jun Xia, Wenjie Du, Sisi Yuan, Xiangzheng Fu, Xinyu Yang, Li Zeng, Lei Xu","doi":"10.1093/bioinformatics/btaf535","DOIUrl":"10.1093/bioinformatics/btaf535","url":null,"abstract":"Motivation: Accurate prediction of protein-ligand binding (PLB) relationships plays a crucial role in drug discovery, which helps identify drugs that modulate the activity of specific targets. Traditional biological assays for measuring PLB relationships are time consuming and costly. In addition, models for predicting PLB relationships have been developed and widely used in drug discovery tasks. However, learning more accurate PLB representations is essential to meet the stringent standards required for drug discovery.Results: We propose an image-based PLB representation learning framework, called ImagePLB, which equips ligand representation learner (LRL) and protein representation learner (PRL) to accept 3D multi-view ligand images and protein graphs as input, respectively, and learns rich interaction information between ligand and protein through a binding representation learner (BRL). Considering the scarcity of protein-ligand pairs, we further propose a multi-level next trajectory prediction (MLNTP) task to pre-train ImagePLB on the 4D flexible dynamics trajectory of 16 972 complexes, including ligand level, protein level, and complex level, to learn information related to trajectories. Besides, by introducing trajectory regularization (TR), we effectively alleviate the problem of high (even almost identical) feature similarity caused by adjacent trajectories. Compared with the current state-of-the-art methods, ImagePLB has achieved competitive improvements on PLB-related prediction tasks, including protein-ligand affinity and efficacy prediction tasks. This study opens the door to the image-based PLB learning paradigm.Availability and implementation: All data and implementation details of code can be obtained from https://github.com/HongxinXiang/ImagePLB.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PRISM: privacy-preserving rare disease analysis using fully homomorphic encryption. PRISM：使用完全同态加密保护隐私的罕见疾病分析。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf468

Güliz Akkaya, Nesli Erdoğmuş, Mete Akgün

{"title":"PRISM: privacy-preserving rare disease analysis using fully homomorphic encryption.","authors":"Güliz Akkaya, Nesli Erdoğmuş, Mete Akgün","doi":"10.1093/bioinformatics/btaf468","DOIUrl":"10.1093/bioinformatics/btaf468","url":null,"abstract":"Motivation: Rare diseases affect millions of people worldwide, yet their genomic foundations remain poorly understood due to limited patient data and strict privacy regulations, such as the General Data Protection Regulation (GDPR) (https://gdpr.eu/tag/gdpr/) in March 2025. These restrictions can hinder the collaborative analysis of genomic data necessary for uncovering disease-causing variants.Results: We present PRISM, a novel privacy-preserving framework based on fully homomorphic encryption (FHE) that facilitates rare disease variant analysis across multiple institutions without exposing sensitive genomic information. To address the challenges of centralized trust, PRISM is built upon a Threshold FHE scheme. This approach decentralizes key management across participating institutions and ensures no single entity can unilaterally decrypt sensitive data. Our method filters disease-causing variants under recessive, dominant, and de novo inheritance models entirely on encrypted data. We propose two algorithmic variants: a multiplication-intensive (MUL-IN) approach and an addition-intensive (ADD-IN) approach. The ADD-IN algorithms minimize the number of costly multiplication operations, enabling up to a 17× improvement in runtime for recessive/dominant filtering and 22× for de novo filtering, compared to MUL-IN methods. While ADD-IN produces larger ciphertexts, efficient parallelization via SIMD and multithreading allows it to handle millions of variants in reasonable time. To the best of our knowledge, this is the first study that utilizes FHE for privacy-preserving rare disease analysis across multiple inheritance models, demonstrating its practicality and scalability in a single-cloud setting.Availability and implementation: The source code and the data used in this work can be found in https://github.com/mdppml/PRISM.git.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-dimensional causal mediation analysis by partial sum statistic and sample splitting strategy in imaging genetics application. 基于偏和统计和样本分割策略的高维因果中介分析在成像遗传学中的应用。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf493

Hung-Ching Chang, Yusi Fang, Michael T Gorczyca, Kayhan Batmanghelich, George C Tseng

{"title":"High-dimensional causal mediation analysis by partial sum statistic and sample splitting strategy in imaging genetics application.","authors":"Hung-Ching Chang, Yusi Fang, Michael T Gorczyca, Kayhan Batmanghelich, George C Tseng","doi":"10.1093/bioinformatics/btaf493","DOIUrl":"10.1093/bioinformatics/btaf493","url":null,"abstract":"Summary: Causal mediation analysis investigates the role of mediators in the relationship between exposure and outcome. In the analysis of omics or imaging data, mediators are often high-dimensional, presenting challenges such as multicollinearity and interpretability. Existing methods either compromise interpretability or fail to effectively prioritize mediators. To address these challenges and advance causal mediation analysis in high-dimensional contexts, we propose the Partial Sum Statistic and Sample Splitting Strategy (PS5) framework. Through extensive simulations, we demonstrate that PS5 offers superior type I error control, higher statistical power, reduced bias in mediation effect estimation, and more accurate mediator selection. We apply PS5 to an imaging genetics dataset of chronic obstructive pulmonary disease (COPD) patients from the COPDGene study. The results show successful estimation of the global indirect effect and identification of mediating image regions. Notably, we identify a region in the lower lobe of the lung that exhibits a strong and concordant mediation effect for both genetic and environmental exposures, suggesting potential targets for treatment to mitigate COPD severity caused by genetic and smoking effects.Availability and implementation: PS5 is publicly available at https://github.com/hung-ching-chang/PS5Med.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145034730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WinPCA: a package for windowed principal component analysis. WinPCA：一个用于窗口主成分分析的包。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf529

L Moritz Blumer, Jeffrey M Good, Richard Durbin

{"title":"WinPCA: a package for windowed principal component analysis.","authors":"L Moritz Blumer, Jeffrey M Good, Richard Durbin","doi":"10.1093/bioinformatics/btaf529","DOIUrl":"10.1093/bioinformatics/btaf529","url":null,"abstract":"Summary: With chromosomal reference genomes and population-scale whole genome-sequencing becoming increasingly accessible, contemporary studies often include characterizations of the genomic landscape as it varies along chromosomes, commonly termed genome scans. While traditional summary statistics like FST and dXY between pre-assigned populations remain integral to characterizing the genomic divergence profile, PCA differs by providing single-sample resolution, thereby supporting the identification of polymorphic inversions, introgression and other types of divergent sequence that may not be fully aligned with global population structure. Here, we introduce WinPCA, a user-friendly package to compute, polarize and visualize genetic principal components in windows along the genome. To accommodate low-coverage whole genome-sequencing datasets, WinPCA can optionally make use of PCAngsd methods to compute principal components in a genotype likelihood framework. WinPCA accepts variant data in either VCF or BEAGLE format and can generate rich plots for interactive data exploration and downstream presentation.Availability and implementation: WinPCA is implemented in Python and freely available at https://github.com/MoritzBlumer/winpca and https://doi.org/10.5281/zenodo.15614979.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AXZ viewer: a web application to visualize unprocessed AFM-IR data. AXZ查看器：用于可视化未处理的AFM - IR数据的web应用程序。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf513

Wouter Duverger, Georg Ramer, Nikolaos Louros, Joost Schymkowitz, Frederic Rousseau

{"title":"AXZ viewer: a web application to visualize unprocessed AFM-IR data.","authors":"Wouter Duverger, Georg Ramer, Nikolaos Louros, Joost Schymkowitz, Frederic Rousseau","doi":"10.1093/bioinformatics/btaf513","DOIUrl":"10.1093/bioinformatics/btaf513","url":null,"abstract":"Motivation: Atomic Force Microscopy-based Infrared spectroscopy (AFM-IR) is a novel and innovative method for label-free high-resolution structural biology. However, the nature of the data files generated by AFM-IR instruments precludes investigation by conventional open-source scientific image analysis software suites. As a result, reporting of AFM-IR datasets is not standardized and the data itself is difficult to audit.Results: We have developed a web application that allows anyone to open, review, and audit raw AFM-IR data files easily and without deep knowledge of the method. It also exposes all metadata recorded by the microscope at the time of measurement. The web application is based on a Python package that supports custom data analyses within the scientific Python ecosystem. This tool provides an accessible, transparent solution for AFM-IR data review, with the potential to support reproducibility and standardization in AFM-IR research and encourage wider adoption of this innovative spectroscopy method.Availability and implementation: The web app is hosted at https://anasys-python-tools-gui.streamlit.app. Its source code is listed at https://github.com/wduverger/anasys-python-tools-gui. The underlying Python package is available at https://github.com/GeorgRamer/anasys-python-tools and can be installed using pip.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MPAC: a computational framework for inferring pathway activities from multi-omic data. MPAC：从多组学数据推断通路活动的计算框架。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf490

Peng Liu, David Page, Paul Ahlquist, Irene M Ong, Anthony Gitter

{"title":"MPAC: a computational framework for inferring pathway activities from multi-omic data.","authors":"Peng Liu, David Page, Paul Ahlquist, Irene M Ong, Anthony Gitter","doi":"10.1093/bioinformatics/btaf490","DOIUrl":"10.1093/bioinformatics/btaf490","url":null,"abstract":"Motivation: Fully capturing cellular state requires examining genomic, epigenomic, transcriptomic, proteomic, and other assays for a biological sample and comprehensive computational modeling to reason with the complex and sometimes conflicting measurements. Modeling these so-called multi-omic data is especially beneficial in disease analysis, where observations across omic data types may reveal unexpected patient groupings and inform clinical outcomes and treatments.Results: We present Multi-omic Pathway Analysis of Cells (MPAC), a computational framework that interprets multi-omic data through prior knowledge from biological pathways. MPAC leverages network relationships encoded in pathways through a factor graph to infer consensus activity levels for proteins and associated pathway entities from multi-omic data, runs permutation testing to eliminate spurious activity predictions, and groups biological samples by pathway activities to allow identifying and prioritizing proteins with potential clinical relevance, e.g. associated with patient prognosis. Using DNA copy number alteration and RNA-seq data from head and neck squamous cell carcinoma patients from The Cancer Genome Atlas as an example, we demonstrate that MPAC predicts a patient subgroup related to immune responses not identified by analysis with either input omic data type alone. Key proteins identified via this subgroup have pathway activities related to clinical outcome as well as immune cell composition. Our MPAC R package enables similar multi-omic analyses on new datasets.Availability and implementation: The MPAC package is available at Bioconductor https://bioconductor.org/packages/MPAC.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 10","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496133/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145228355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HIPSTR: highest independent posterior subtree reconstruction in TreeAnnotator X. HIPSTR：在TreeAnnotator X中最高独立后验子树重建。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf488

Guy Baele, Luiz M Carvalho, Marius Brusselmans, Gytis Dudas, Xiang Ji, John T McCrone, Philippe Lemey, Marc A Suchard, Andrew Rambaut

{"title":"HIPSTR: highest independent posterior subtree reconstruction in TreeAnnotator X.","authors":"Guy Baele, Luiz M Carvalho, Marius Brusselmans, Gytis Dudas, Xiang Ji, John T McCrone, Philippe Lemey, Marc A Suchard, Andrew Rambaut","doi":"10.1093/bioinformatics/btaf488","DOIUrl":"10.1093/bioinformatics/btaf488","url":null,"abstract":"Summary: In Bayesian phylogenetic and phylodynamic studies, it is common to summarize the posterior distribution of trees with a time-calibrated summary phylogeny. While the maximum clade credibility (MCC) tree is often used for this purpose, we here show that a novel summary tree method-the highest independent posterior subtree reconstruction, or (HIPSTR)-contains consistently higher supported clades over MCC. We also provide faster computational routines for estimating both summary trees in an updated version of TreeAnnotator X, an open-source software program that summarizes the information from a sample of trees and returns many helpful statistics such as individual clade credibilities contained in the summary tree.Results: HIPSTR and MCC reconstructions on two Ebola virus and two SARS-CoV-2 datasets show that HIPSTR yields summary trees that consistently contain clades with higher support compared to MCC trees. The MCC trees regularly fail to include several clades with very high posterior probability (≥0.95) as well as a large number of clades with moderate to high posterior probability (≥50%), whereas HIPSTR-in particular its majority-rule extension MrHIPSTR-achieves near-perfect performance in this respect. HIPSTR and MrHIPSTR also exhibit favourable computational performance over MCC in TreeAnnotator X. Comparison to the recent CCD0-MAP algorithm yielded mixed results and requires a more in-depth investigation in follow-up studies.Availability and implementation: TreeAnnotator X is available as part of the BEAST X (v10.5.0) software package, available at https://github.com/beast-dev/beast-mcmc/releases, and on Zenodo (DOI: https://doi.org/10.5281/zenodo.4895234).","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490824/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0