Baike She, Rebecca Lee Smith, Ian Pytlarz, Shreyas Sundaram, Philip E Paré
{"title":"A framework for counterfactual analysis, strategy evaluation, and control of epidemics using reproduction number estimates.","authors":"Baike She, Rebecca Lee Smith, Ian Pytlarz, Shreyas Sundaram, Philip E Paré","doi":"10.1371/journal.pcbi.1012569","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1012569","url":null,"abstract":"<p><p>During pandemics, countries, regions, and communities develop various epidemic models to evaluate spread and guide mitigation policies. However, model uncertainties caused by complex transmission behaviors, contact-tracing networks, time-varying parameters, human factors, and limited data present significant challenges to model-based approaches. To address these issues, we propose a novel framework that centers around reproduction number estimates to perform counterfactual analysis, strategy evaluation, and feedback control of epidemics. The framework 1) introduces a mechanism to quantify the impact of the testing-for-isolation intervention strategy on the basic reproduction number. Building on this mechanism, the framework 2) proposes a method to reverse engineer the effective reproduction number under different strengths of the intervention strategy. In addition, based on the method that quantifies the impact of the testing-for-isolation strategy on the basic reproduction number, the framework 3) proposes a closed-loop control algorithm that uses the effective reproduction number both as feedback to indicate the severity of the spread and as the control goal to guide adjustments in the intensity of the intervention. We illustrate the framework, along with its three core methods, by addressing three key questions and validating its effectiveness using data collected during the COVID-19 pandemic at the University of Illinois Urbana-Champaign (UIUC) and Purdue University: 1) How severe would an outbreak have been without the implemented intervention strategies? 2) What impact would varying the intervention strength have had on an outbreak? 3) How can we adjust the intervention intensity based on the current state of an outbreak?</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012569"},"PeriodicalIF":3.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLoS Computational BiologyPub Date : 2024-11-20eCollection Date: 2024-11-01DOI: 10.1371/journal.pcbi.1012543
Hyun Joo Ji, Steven L Salzberg
{"title":"Upstream open reading frames may contain hundreds of novel human exons.","authors":"Hyun Joo Ji, Steven L Salzberg","doi":"10.1371/journal.pcbi.1012543","DOIUrl":"10.1371/journal.pcbi.1012543","url":null,"abstract":"<p><p>Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which might create an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 541 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012543"},"PeriodicalIF":3.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Liu, Francesca Young, Kieran D Lamb, David L Robertson, Ke Yuan
{"title":"Prediction of virus-host associations using protein language models and multiple instance learning.","authors":"Dan Liu, Francesca Young, Kieran D Lamb, David L Robertson, Ke Yuan","doi":"10.1371/journal.pcbi.1012597","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1012597","url":null,"abstract":"<p><p>Predicting virus-host associations is essential to determine the specific host species that viruses interact with, and discover if new viruses infect humans and animals. Currently, the host of the majority of viruses is unknown, particularly in microbiomes. To address this challenge, we introduce EvoMIL, a deep learning method that predicts the host species for viruses from viral sequences only. It also identifies important viral proteins that significantly contribute to host prediction. The method combines a pre-trained large protein language model (ESM) and attention-based multiple instance learning to allow protein-orientated predictions. Our results show that protein embeddings capture stronger predictive signals than sequence composition features, including amino acids, physiochemical properties, and DNA k-mers. In multi-host prediction tasks, EvoMIL achieves median F1 score improvements of 10.8%, 16.2%, and 4.9% in prokaryotic hosts, and 1.7%, 6.6% and 11.5% in eukaryotic hosts. EvoMIL binary classifiers achieve impressive AUC over 0.95 for all prokaryotic hosts and range from roughly 0.8 to 0.9 for eukaryotic hosts. Furthermore, EvoMIL identifies important proteins in the prediction task. We found them capturing key functions in virus-host specificity.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012597"},"PeriodicalIF":3.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evandro Konzen, Richard J Delahay, Dave J Hodgson, Robbie A McDonald, Ellen Brooks Pollock, Simon E F Spencer, Trevelyan J McKinley
{"title":"Efficient modelling of infectious diseases in wildlife: A case study of bovine tuberculosis in wild badgers.","authors":"Evandro Konzen, Richard J Delahay, Dave J Hodgson, Robbie A McDonald, Ellen Brooks Pollock, Simon E F Spencer, Trevelyan J McKinley","doi":"10.1371/journal.pcbi.1012592","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1012592","url":null,"abstract":"<p><p>Bovine tuberculosis (bTB) has significant socio-economic and welfare impacts on the cattle industry in parts of the world. In the United Kingdom and Ireland, disease control is complicated by the presence of infection in wildlife, principally the European badger. Control strategies tend to be applied to whole populations, but better identification of key sources of transmission, whether individuals or groups, could help inform more efficient approaches. Mechanistic transmission models can be used to better understand key epidemiological drivers of disease spread and identify high-risk individuals and groups if they can be adequately fitted to observed data. However, this is a significant challenge, especially within wildlife populations, because monitoring relies on imperfect diagnostic test information, and even under systematic surveillance efforts (such as capture-mark-recapture sampling) epidemiological events are only partially observed. To this end we develop a stochastic compartmental model of bTB transmission, and fit this to individual-level data from a unique > 40-year longitudinal study of 2,391 badgers using a recently developed individual forward filtering backward sampling algorithm. Modelling challenges are further compounded by spatio-temporal meta-population structures and age-dependent mortality. We develop a novel estimator for the individual effective reproduction number that provides quantitative evidence for the presence of superspreader badgers, despite the population-level effective reproduction number being less than one. We also infer measures of the hidden burden of infection in the host population through time; the relative likelihoods of competing routes of transmission; effective and realised infectious periods; and longitudinal measures of diagnostic test performance. This modelling framework provides an efficient and generalisable way to fit state-space models to individual-level data in wildlife populations, which allows identification of high-risk individuals and exploration of important epidemiological questions about bTB and other wildlife diseases.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012592"},"PeriodicalIF":3.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beren Millidge, Yuhang Song, Armin Lak, Mark E Walton, Rafal Bogacz
{"title":"Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types.","authors":"Beren Millidge, Yuhang Song, Armin Lak, Mark E Walton, Rafal Bogacz","doi":"10.1371/journal.pcbi.1012580","DOIUrl":"https://doi.org/10.1371/journal.pcbi.1012580","url":null,"abstract":"<p><p>Animals can adapt their preferences for different types of reward according to physiological state, such as hunger or thirst. To explain this ability, we employ a simple multi-objective reinforcement learning model that learns multiple values according to different reward dimensions such as food or water. We show that by weighting these learned values according to the current needs, behaviour may be flexibly adapted to present preferences. This model predicts that individual dopamine neurons should encode the errors associated with some reward dimensions more than with others. To provide a preliminary test of this prediction, we reanalysed a small dataset obtained from a single primate in an experiment which to our knowledge is the only published study where the responses of dopamine neurons to stimuli predicting distinct types of rewards were recorded. We observed that in addition to subjective economic value, dopamine neurons encode a gradient of reward dimensions; some neurons respond most to stimuli predicting food rewards while the others respond more to stimuli predicting fluids. We also proposed a possible implementation of the model in the basal ganglia network, and demonstrated how the striatal system can learn values in multiple dimensions, even when dopamine neurons encode mixtures of prediction error from different dimensions. Additionally, the model reproduces the instant generalisation to new physiological states seen in dopamine responses and in behaviour. Our results demonstrate how a simple neural circuit can flexibly guide behaviour according to animals' needs.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012580"},"PeriodicalIF":3.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baptiste Ruiz, Arnaud Belcour, Samuel Blanquart, Sylvie Buffet-Bataillon, Isabelle Le Huërou-Luron, Anne Siegel, Yann Le Cunff
{"title":"SPARTA: Interpretable functional classification of microbiomes and detection of hidden cumulative effects.","authors":"Baptiste Ruiz, Arnaud Belcour, Samuel Blanquart, Sylvie Buffet-Bataillon, Isabelle Le Huërou-Luron, Anne Siegel, Yann Le Cunff","doi":"10.1371/journal.pcbi.1012577","DOIUrl":"10.1371/journal.pcbi.1012577","url":null,"abstract":"<p><p>The composition of the gut microbiota is a known factor in various diseases and has proven to be a strong basis for automatic classification of disease state. A need for a better understanding of microbiota data on the functional scale has since been voiced, as it would enhance these approaches' biological interpretability. In this paper, we have developed a computational pipeline for integrating the functional annotation of the gut microbiota into an automatic classification process and facilitating downstream interpretation of its results. The process takes as input taxonomic composition data, which can be built from 16S or whole genome sequencing, and links each component to its functional annotations through interrogation of the UniProt database. A functional profile of the gut microbiota is built from this basis. Both profiles, microbial and functional, are used to train Random Forest classifiers to discern unhealthy from control samples. SPARTA ensures full reproducibility and exploration of inherent variability by extending state-of-the-art methods in three dimensions: increased number of trained random forests, selection of important variables with an iterative process, repetition of full selection process from different seeds. This process shows that the translation of the microbiota into functional profiles gives non-significantly different performances when compared to microbial profiles on 5 of 6 datasets. This approach's main contribution however stems from its interpretability rather than its performance: through repetition, it also outputs a robust subset of discriminant variables. These selections were shown to be more consistent than those obtained by a state-of-the-art method, and their contents were validated through a manual bibliographic research. The interconnections between selected taxa and functional annotations were also analyzed and revealed that important annotations emerge from the cumulated influence of non-selected taxa.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012577"},"PeriodicalIF":3.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jørgen Ankill, Zhi Zhao, Xavier Tekpli, Elin H Kure, Vessela N Kristensen, Anthony Mathelier, Thomas Fleischer
{"title":"Integrative pan-cancer analysis reveals a common architecture of dysregulated transcriptional networks characterized by loss of enhancer methylation.","authors":"Jørgen Ankill, Zhi Zhao, Xavier Tekpli, Elin H Kure, Vessela N Kristensen, Anthony Mathelier, Thomas Fleischer","doi":"10.1371/journal.pcbi.1012565","DOIUrl":"10.1371/journal.pcbi.1012565","url":null,"abstract":"<p><p>Aberrant DNA methylation contributes to gene expression deregulation in cancer. However, these alterations' precise regulatory role and clinical implications are still not fully understood. In this study, we performed expression-methylation Quantitative Trait Loci (emQTL) analysis to identify deregulated cancer-driving transcriptional networks linked to CpG demethylation pan-cancer. By analyzing 33 cancer types from The Cancer Genome Atlas, we identified and confirmed significant correlations between CpG methylation and gene expression (emQTL) in cis and trans, both across and within cancer types. Bipartite network analysis of the emQTL revealed groups of CpGs and genes related to important biological processes involved in carcinogenesis including proliferation, metabolism and hormone-signaling. These bipartite communities were characterized by loss of enhancer methylation in specific transcription factor binding regions (TFBRs) and the CpGs were topologically linked to upregulated genes through chromatin loops. Penalized Cox regression analysis showed a significant prognostic impact of the pan-cancer emQTL in many cancer types. Taken together, our integrative pan-cancer analysis reveals a common architecture where hallmark cancer-driving functions are affected by the loss of enhancer methylation and may be epigenetically regulated.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012565"},"PeriodicalIF":3.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salah Bazzi, Stephan Stansfield, Neville Hogan, Dagmar Sternad
{"title":"Simplified internal models in human control of complex objects.","authors":"Salah Bazzi, Stephan Stansfield, Neville Hogan, Dagmar Sternad","doi":"10.1371/journal.pcbi.1012599","DOIUrl":"10.1371/journal.pcbi.1012599","url":null,"abstract":"<p><p>Humans are skillful at manipulating objects that possess nonlinear underactuated dynamics, such as clothes or containers filled with liquids. Several studies suggested that humans implement a predictive model-based strategy to control such objects. However, these studies only considered unconstrained reaching without any object involved or, at most, linear mass-spring systems with relatively simple dynamics. It is not clear what internal model humans develop of more complex objects, and what level of granularity is represented. To answer these questions, this study examined a task where participants physically interacted with a nonlinear underactuated system mimicking a cup of sloshing coffee: a cup with a ball rolling inside. The cup and ball were simulated in a virtual environment and subjects interacted with the system via a haptic robotic interface. Participants were instructed to move the system and arrive at a target region with both cup and ball at rest, 'zeroing out' residual oscillations of the ball. This challenging task affords a solution known as 'input shaping', whereby a series of pulses moves the dynamic object to the target leaving no residual oscillations. Since the timing and amplitude of these pulses depend on the controller's internal model of the object, input shaping served as a tool to identify the subjects' internal representation of the cup-and-ball. Five simulations with different internal models were compared against the human data. Results showed that the features in the data were correctly predicted by a simple internal model that represented the cup-and-ball as a single rigid mass coupled to the hand impedance. These findings provide evidence that humans use simplified internal models along with mechanical impedance to manipulate complex objects.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012599"},"PeriodicalIF":3.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storm: Incorporating transient stochastic dynamics to infer the RNA velocity with metabolic labeling information.","authors":"Qiangwei Peng, Xiaojie Qiu, Tiejun Li","doi":"10.1371/journal.pcbi.1012606","DOIUrl":"10.1371/journal.pcbi.1012606","url":null,"abstract":"<p><p>The time-resolved scRNA-seq (tscRNA-seq) provides the possibility to infer physically meaningful kinetic parameters, e.g., the transcription, splicing or RNA degradation rate constants with correct magnitudes, and RNA velocities by incorporating temporal information. Previous approaches utilizing the deterministic dynamics and steady-state assumption on gene expression states are insufficient to achieve favorable results for the data involving transient process. We present a dynamical approach, Storm (Stochastic models of RNA metabolic-labeling), to overcome these limitations by solving stochastic differential equations of gene expression dynamics. The derivation reveals that the new mRNA sequencing data obeys different types of cell-specific Poisson distributions when jointly considering both biological and cell-specific technical noise. Storm deals with measured counts data directly and extends the RNA velocity methodology based on metabolic labeling scRNA-seq data to transient stochastic systems. Furthermore, we relax the constant parameter assumption over genes/cells to obtain gene-cell-specific transcription/splicing rates and gene-specific degradation rates, thus revealing time-dependent and cell-state-specific transcriptional regulations. Storm will facilitate the study of the statistical properties of tscRNA-seq data, eventually advancing our understanding of the dynamic transcription regulation during development and disease.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012606"},"PeriodicalIF":3.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Cai, Yuehua Wei, Daniel Kirchhofer, Andrew Chang, Yingnan Zhang
{"title":"Rapid prediction of key residues for foldability by machine learning model enables the design of highly functional libraries with hyperstable constrained peptide scaffolds.","authors":"Fei Cai, Yuehua Wei, Daniel Kirchhofer, Andrew Chang, Yingnan Zhang","doi":"10.1371/journal.pcbi.1012609","DOIUrl":"10.1371/journal.pcbi.1012609","url":null,"abstract":"<p><p>Peptides are an emerging modality for developing therapeutics that can either agonize or antagonize cellular pathways associated with disease, yet peptides often suffer from poor chemical and physical stability, which limits their potential. However, naturally occurring disulfide-constrained peptides (DCPs) and de novo designed Hyperstable Constrained Peptides (HCPs) exhibiting highly stable and drug-like scaffolds, making them attractive therapeutic modalities. Previously, we established a robust platform for discovering peptide therapeutics by utilizing multiple DCPs as scaffolds. However, we realized that those libraries could be further improved by considering the foldability of peptide scaffolds for library design. We hypothesized that specific sequence patterns within the peptide scaffolds played a crucial role in spontaneous folding into a stable topology, and thus, these sequences should not be subject to randomization in the original library design. Therefore, we developed a method for designing highly diverse DCP libraries while preserving the inherent foldability of each scaffold. To achieve this, we first generated a large-scale dataset from yeast surface display (YSD) combined with shotgun alanine scan experiments to train a machine-learning (ML) model based on techniques used for natural language understanding. Then we validated the ML model with experiments, showing that it is able to not only predict the foldability of peptides with high accuracy across a broad range of sequences but also pinpoint residues critical for foldability. Using the insights gained from the alanine scanning experiment as well as prediction model, we designed a new peptide library based on a de novo-designed HCP, which was optimized for enhanced folding efficiency. Subsequent panning trials using this library yielded promising hits having good folding properties. In summary, this work advances peptide or small protein domain library design practices. These findings could pave the way for the efficient development of peptide-based therapeutics in the future.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012609"},"PeriodicalIF":3.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}