{"title":"iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction.","authors":"Lin Yuan, Jiawang Zhao, Zhen Shen, Qinhu Zhang, Yushui Geng, Chun-Hou Zheng, De-Shuang Huang","doi":"10.1371/journal.pcbi.1011344","DOIUrl":"10.1371/journal.pcbi.1011344","url":null,"abstract":"<p><p>Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10151643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-31eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011404
Olivier Dennler, François Coste, Samuel Blanquart, Catherine Belleannée, Nathalie Théret
{"title":"Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family.","authors":"Olivier Dennler, François Coste, Samuel Blanquart, Catherine Belleannée, Nathalie Théret","doi":"10.1371/journal.pcbi.1011404","DOIUrl":"10.1371/journal.pcbi.1011404","url":null,"abstract":"<p><p>Numerous computational methods based on sequences or structures have been developed for the characterization of protein function, but they are still unsatisfactory to deal with the multiple functions of multi-domain protein families. Here we propose an original approach based on 1) the detection of conserved sequence modules using partial local multiple alignment, 2) the phylogenetic inference of species/genes/modules/functions evolutionary histories, and 3) the identification of co-appearances of modules and functions. Applying our framework to the multidomain ADAMTS-TSL family including ADAMTS (A Disintegrin-like and Metalloproteinase with ThromboSpondin motif) and ADAMTS-like proteins over nine species including human, we identify 45 sequence module signatures that are associated with the occurrence of 278 Protein-Protein Interactions in ancestral genes. Some of these signatures are supported by published experimental data and the others provide new insights (e.g. ADAMTS-5). The module signatures of ADAMTS ancestors notably highlight the dual variability of the propeptide and ancillary regions suggesting the importance of these two regions in the specialization of ADAMTS during evolution. Our analyses further indicate convergent interactions of ADAMTS with COMP and CCN2 proteins. Overall, our study provides 186 sequence module signatures that discriminate distinct subgroups of ADAMTS and ADAMTSL and that may result from selective pressures on novel functions and phenotypes.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10499240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10587088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-30eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011407
Jordan Hembrow, Michael J Deeks, David M Richards
{"title":"Automatic extraction of actin networks in plants.","authors":"Jordan Hembrow, Michael J Deeks, David M Richards","doi":"10.1371/journal.pcbi.1011407","DOIUrl":"10.1371/journal.pcbi.1011407","url":null,"abstract":"<p><p>The actin cytoskeleton is essential in eukaryotes, not least in the plant kingdom where it plays key roles in cell expansion, cell division, environmental responses and pathogen defence. Yet, the precise structure-function relationships of properties of the actin network in plants are still to be unravelled, including details of how the network configuration depends upon cell type, tissue type and developmental stage. Part of the problem lies in the difficulty of extracting high-quality, quantitative measures of actin network features from microscopy data. To address this problem, we have developed DRAGoN, a novel image analysis algorithm that can automatically extract the actin network across a range of cell types, providing seventeen different quantitative measures that describe the network at a local level. Using this algorithm, we then studied a number of cases in Arabidopsis thaliana, including several different tissues, a variety of actin-affected mutants, and cells responding to powdery mildew. In many cases we found statistically-significant differences in actin network properties. In addition to these results, our algorithm is designed to be easily adaptable to other tissues, mutants and plants, and so will be a valuable asset for the study and future biological engineering of the actin cytoskeleton in globally-important crops.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497154/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10238543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-30eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011359
Shawn A Means, Mathias W Roesler, Amy S Garrett, Leo Cheng, Alys R Clark
{"title":"Steady-state approximations for Hodgkin-Huxley cell models: Reduction of order for uterine smooth muscle cell model.","authors":"Shawn A Means, Mathias W Roesler, Amy S Garrett, Leo Cheng, Alys R Clark","doi":"10.1371/journal.pcbi.1011359","DOIUrl":"10.1371/journal.pcbi.1011359","url":null,"abstract":"<p><p>Multi-scale mathematical bioelectrical models of organs such as the uterus, stomach or heart present challenges both for accuracy and computational tractability. These multi-scale models are typically founded on models of biological cells derived from the classic Hodkgin-Huxley (HH) formalism. Ion channel behaviour is tracked with dynamical variables representing activation or inactivation of currents that relax to steady-state dependencies on cellular membrane voltage. Timescales for relaxation may be orders of magnitude faster than companion ion channel variables or phenomena of physiological interest for the entire cell (such as bursting sequences of action potentials) or the entire organ (such as electromechanical coordination). Exploiting these time scales with steady-state approximations for relatively fast-acting systems is a well-known but often overlooked approach as evidenced by recent published models. We thus investigate feasibility of an extensive reduction of order for an HH-type cell model with steady-state approximations to the full dynamical activation and inactivation ion channel variables. Our effort utilises a published comprehensive uterine smooth muscle cell model that encompasses 19 ordinary differential equations and 105 formulations overall. The numerous ion channel submodels in the published model exhibit relaxation times ranging from order 10-1 to 105 milliseconds. Substitution of the faster dynamic variables with steady-state formulations demonstrates both an accurate reproduction of the full model and substantial improvements in time-to-solve, for test cases performed. Our demonstration here of an effective and relatively straightforward reduction method underlines the particular importance of considering time scales for model simplification before embarking on large-scale computations or parameter sweeps. As a preliminary complement to more intensive reduction of order methods such as parameter sensitivity and bifurcation analysis, this approach can rapidly and accurately improve computational tractability for challenging multi-scale organ modelling efforts.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468033/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10153158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-30eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011216
Jonathan Morgan, Alan E Lindsay
{"title":"Modulation of antigen discrimination by duration of immune contacts in a kinetic proofreading model of T cell activation with extreme statistics.","authors":"Jonathan Morgan, Alan E Lindsay","doi":"10.1371/journal.pcbi.1011216","DOIUrl":"10.1371/journal.pcbi.1011216","url":null,"abstract":"<p><p>T cells form transient cell-to-cell contacts with antigen presenting cells (APCs) to facilitate surface interrogation by membrane bound T cell receptors (TCRs). Upon recognition of molecular signatures (antigen) of pathogen, T cells may initiate an adaptive immune response. The duration of the T cell/APC contact is observed to vary widely, yet it is unclear what constructive role, if any, such variations might play in immune signaling. Modeling efforts describing antigen discrimination often focus on steady-state approximations and do not account for the transient nature of cellular contacts. Within the framework of a kinetic proofreading (KP) mechanism, we develop a stochastic First Receptor Activation Model (FRAM) describing the likelihood that a productive immune signal is produced before the expiry of the contact. Through the use of extreme statistics, we characterize the probability that the first TCR triggering is induced by a rare agonist antigen and not by that of an abundant self-antigen. We show that defining positive immune outcomes as resilience to extreme statistics and sensitivity to rare events mitigates classic tradeoffs associated with KP. By choosing a sufficient number of KP steps, our model is able to yield single agonist sensitivity whilst remaining non-reactive to large populations of self antigen, even when self and agonist antigen are similar in dissociation rate to the TCR but differ largely in expression. Additionally, our model achieves high levels of accuracy even when agonist positive APCs encounters are rare. Finally, we discuss potential biological costs associated with high classification accuracy, particularly in challenging T cell environments.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10604253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-29eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011393
Nikos I Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, Sebastian Funk
{"title":"Scoring epidemiological forecasts on transformed scales.","authors":"Nikos I Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, Sebastian Funk","doi":"10.1371/journal.pcbi.1011393","DOIUrl":"10.1371/journal.pcbi.1011393","url":null,"abstract":"<p><p>Forecast evaluation is essential for the development of predictive epidemic models and can inform their use for public health decision-making. Common scores to evaluate epidemiological forecasts are the Continuous Ranked Probability Score (CRPS) and the Weighted Interval Score (WIS), which can be seen as measures of the absolute distance between the forecast distribution and the observation. However, applying these scores directly to predicted and observed incidence counts may not be the most appropriate due to the exponential nature of epidemic processes and the varying magnitudes of observed values across space and time. In this paper, we argue that transforming counts before applying scores such as the CRPS or WIS can effectively mitigate these difficulties and yield epidemiologically meaningful and easily interpretable results. Using the CRPS on log-transformed values as an example, we list three attractive properties: Firstly, it can be interpreted as a probabilistic version of a relative error. Secondly, it reflects how well models predicted the time-varying epidemic growth rate. And lastly, using arguments on variance-stabilizing transformations, it can be shown that under the assumption of a quadratic mean-variance relationship, the logarithmic transformation leads to expected CRPS values which are independent of the order of magnitude of the predicted quantity. Applying a transformation of log(x + 1) to data and forecasts from the European COVID-19 Forecast Hub, we find that it changes model rankings regardless of stratification by forecast date, location or target types. Situations in which models missed the beginning of upward swings are more strongly emphasised while failing to predict a downturn following a peak is less severely penalised when scoring transformed forecasts as opposed to untransformed ones. We conclude that appropriate transformations, of which the natural logarithm is only one particularly attractive option, should be considered when assessing the performance of different models in the context of infectious disease incidence.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10236556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-28eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011370
Wenxing Hu, Lixin Guan, Mengshan Li
{"title":"Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network.","authors":"Wenxing Hu, Lixin Guan, Mengshan Li","doi":"10.1371/journal.pcbi.1011370","DOIUrl":"10.1371/journal.pcbi.1011370","url":null,"abstract":"DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10461834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10119990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-28eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011422
Guillermo Rangel-Pineros, Alexandre Almeida, Martin Beracochea, Ekaterina Sakharova, Manja Marz, Alejandro Reyes Muñoz, Martin Hölzer, Robert D Finn
{"title":"VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models.","authors":"Guillermo Rangel-Pineros, Alexandre Almeida, Martin Beracochea, Ekaterina Sakharova, Manja Marz, Alejandro Reyes Muñoz, Martin Hölzer, Robert D Finn","doi":"10.1371/journal.pcbi.1011422","DOIUrl":"10.1371/journal.pcbi.1011422","url":null,"abstract":"<p><p>The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10207472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-28eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011392
Michael Morris, Peter Hayes, Ingemar J Cox, Vasileios Lampos
{"title":"Neural network models for influenza forecasting with associated uncertainty using Web search activity trends.","authors":"Michael Morris, Peter Hayes, Ingemar J Cox, Vasileios Lampos","doi":"10.1371/journal.pcbi.1011392","DOIUrl":"10.1371/journal.pcbi.1011392","url":null,"abstract":"<p><p>Influenza affects millions of people every year. It causes a considerable amount of medical visits and hospitalisations as well as hundreds of thousands of deaths. Forecasting influenza prevalence with good accuracy can significantly help public health agencies to timely react to seasonal or novel strain epidemics. Although significant progress has been made, influenza forecasting remains a challenging modelling task. In this paper, we propose a methodological framework that improves over the state-of-the-art forecasting accuracy of influenza-like illness (ILI) rates in the United States. We achieve this by using Web search activity time series in conjunction with historical ILI rates as observations for training neural network (NN) architectures. The proposed models incorporate Bayesian layers to produce associated uncertainty intervals to their forecast estimates, positioning themselves as legitimate complementary solutions to more conventional approaches. The best performing NN, referred to as the iterative recurrent neural network (IRNN) architecture, reduces mean absolute error by 10.3% and improves skill by 17.1% on average in nowcasting and forecasting tasks across 4 consecutive flu seasons.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491400/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10251469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2023-08-28eCollection Date: 2023-08-01DOI: 10.1371/journal.pcbi.1011419
Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard
{"title":"Accelerating Bayesian inference of dependency between mixed-type biological traits.","authors":"Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard","doi":"10.1371/journal.pcbi.1011419","DOIUrl":"10.1371/journal.pcbi.1011419","url":null,"abstract":"<p><p>Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10207471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}