Algorithms for Molecular Biology最新文献

筛选
英文 中文
Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions. 考虑重复基因和基因间区域的重排距离逼近算法。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-10-13 DOI: 10.1186/s13015-021-00200-w
Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Zanoni Dias
{"title":"Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions.","authors":"Gabriel Siqueira,&nbsp;Alexsandro Oliveira Alexandrino,&nbsp;Andre Rodrigues Oliveira,&nbsp;Zanoni Dias","doi":"10.1186/s13015-021-00200-w","DOIUrl":"https://doi.org/10.1186/s13015-021-00200-w","url":null,"abstract":"<p><p>The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes as sequences of genes without repetition. More realistic models started to consider gene repetition or the presence of intergenic regions, sequences of nucleotides between genes and in the extremities of the genome. This work explores the transposition and reversal events applied in a genome representation considering both gene repetition and intergenic regions. We define two problems called Minimum Common Intergenic String Partition and Reverse Minimum Common Intergenic String Partition. Using a relation with these two problems, we show a [Formula: see text]-approximation for the Intergenic Transposition Distance, the Intergenic Reversal Distance, and the Intergenic Reversal and Transposition Distance problems, where k is the maximum number of copies of a gene in the genomes. Our practical experiments on simulated genomes show that the use of partitions improves the estimates for the distances.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"21"},"PeriodicalIF":1.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8513232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39539880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Heuristic algorithms for best match graph editing. 最佳匹配图编辑的启发式算法。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-08-17 DOI: 10.1186/s13015-021-00196-3
David Schaller, Manuela Geiß, Marc Hellmuth, Peter F Stadler
{"title":"Heuristic algorithms for best match graph editing.","authors":"David Schaller,&nbsp;Manuela Geiß,&nbsp;Marc Hellmuth,&nbsp;Peter F Stadler","doi":"10.1186/s13015-021-00196-3","DOIUrl":"https://doi.org/10.1186/s13015-021-00196-3","url":null,"abstract":"<p><strong>Background: </strong>Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data.</p><p><strong>Results: </strong>Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing.</p><p><strong>Conclusion: </strong>Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"19"},"PeriodicalIF":1.0,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39320777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis. INGOT-DR:预测结核分枝杆菌耐药性的可解释分类器。
IF 1.5 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-08-10 DOI: 10.1186/s13015-021-00198-1
Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, Leonid Chindelevitch
{"title":"INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis.","authors":"Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, Leonid Chindelevitch","doi":"10.1186/s13015-021-00198-1","DOIUrl":"10.1186/s13015-021-00198-1","url":null,"abstract":"<p><strong>Motivation: </strong>Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data.</p><p><strong>Contribution: </strong>In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time.</p><p><strong>Results: </strong>We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"17"},"PeriodicalIF":1.5,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8353837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39298492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate search for known gene clusters in new genomes using PQ-trees. 使用pq树对新基因组中已知基因簇进行近似搜索。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-07-09 DOI: 10.1186/s13015-021-00190-9
Galia R Zimerman, Dina Svetlitsky, Meirav Zehavi, Michal Ziv-Ukelson
{"title":"Approximate search for known gene clusters in new genomes using PQ-trees.","authors":"Galia R Zimerman,&nbsp;Dina Svetlitsky,&nbsp;Meirav Zehavi,&nbsp;Michal Ziv-Ukelson","doi":"10.1186/s13015-021-00190-9","DOIUrl":"https://doi.org/10.1186/s13015-021-00190-9","url":null,"abstract":"<p><p>Gene clusters are groups of genes that are co-locally conserved across various genomes, not necessarily in the same order. Their discovery and analysis is valuable in tasks such as gene annotation and prediction of gene interactions, and in the study of genome organization and evolution. The discovery of conserved gene clusters in a given set of genomes is a well studied problem, but with the rapid sequencing of prokaryotic genomes a new problem is inspired. Namely, given an already known gene cluster that was discovered and studied in one genomic dataset, to identify all the instances of the gene cluster in a given new genomic sequence. Thus, we define a new problem in comparative genomics, denoted PQ-TREE SEARCH that takes as input a PQ-tree T representing the known gene orders of a gene cluster of interest, a gene-to-gene substitution scoring function h, integer arguments [Formula: see text] and [Formula: see text], and a new sequence of genes S. The objective is to identify in S approximate new instances of the gene cluster; These instances could vary from the known gene orders by genome rearrangements that are constrained by T, by gene substitutions that are governed by h, and by gene deletions and insertions that are bounded from above by [Formula: see text] and [Formula: see text], respectively. We prove that PQ-TREE SEARCH is NP-hard and propose a parameterized algorithm that solves the optimization variant of PQ-TREE SEARCH in [Formula: see text] time, where [Formula: see text] is the maximum degree of a node in T and [Formula: see text] is used to hide factors polynomial in the input size. The algorithm is implemented as a search tool, denoted PQFinder, and applied to search for instances of chromosomal gene clusters in plasmids, within a dataset of 1,487 prokaryotic genomes. We report on 29 chromosomal gene clusters that are rearranged in plasmids, where the rearrangements are guided by the corresponding PQ-trees. One of these results, coding for a heavy metal efflux pump, is further analysed to exemplify how PQFinder can be harnessed to reveal interesting new structural variants of known gene clusters.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"16"},"PeriodicalIF":1.0,"publicationDate":"2021-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-021-00190-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39170062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Shape decomposition algorithms for laser capture microdissection. 激光捕获显微解剖的形状分解算法。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-07-08 DOI: 10.1186/s13015-021-00193-6
Leonie Selbach, Tobias Kowalski, Klaus Gerwert, Maike Buchin, Axel Mosig
{"title":"Shape decomposition algorithms for laser capture microdissection.","authors":"Leonie Selbach,&nbsp;Tobias Kowalski,&nbsp;Klaus Gerwert,&nbsp;Maike Buchin,&nbsp;Axel Mosig","doi":"10.1186/s13015-021-00193-6","DOIUrl":"https://doi.org/10.1186/s13015-021-00193-6","url":null,"abstract":"<p><strong>Background: </strong>In the context of biomarker discovery and molecular characterization of diseases, laser capture microdissection is a highly effective approach to extract disease-specific regions from complex, heterogeneous tissue samples. For the extraction to be successful, these regions have to satisfy certain constraints in size and shape and thus have to be decomposed into feasible fragments.</p><p><strong>Results: </strong>We model this problem of constrained shape decomposition as the computation of optimal feasible decompositions of simple polygons. We use a skeleton-based approach and present an algorithmic framework that allows the implementation of various feasibility criteria as well as optimization goals. Motivated by our application, we consider different constraints and examine the resulting fragmentations. We evaluate our algorithm on lung tissue samples in comparison to a heuristic decomposition approach. Our method achieved a success rate of over 95% in the microdissection and tissue yield was increased by 10-30%.</p><p><strong>Conclusion: </strong>We present a novel approach for constrained shape decomposition by demonstrating its advantages for the application in the microdissection of tissue samples. In comparison to the previous decomposition approach, the proposed method considerably increases the amount of successfully dissected tissue.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"15"},"PeriodicalIF":1.0,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-021-00193-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39165163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors. 根据肿瘤单细胞DNA测序数据区分线性进化和分支进化。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-07-06 DOI: 10.1186/s13015-021-00194-5
Leah L Weber, Mohammed El-Kebir
{"title":"Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors.","authors":"Leah L Weber,&nbsp;Mohammed El-Kebir","doi":"10.1186/s13015-021-00194-5","DOIUrl":"https://doi.org/10.1186/s13015-021-00194-5","url":null,"abstract":"<p><strong>Background: </strong>Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor's evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor's evolutionary history as either linear or branched.</p><p><strong>Results: </strong>We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach.</p><p><strong>Conclusion: </strong>Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor's single-cell DNA sequencing data.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"14"},"PeriodicalIF":1.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-021-00194-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39158756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution. 基于进化和结构正则化的定向蛋白质进化贝叶斯优化。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-07-01 DOI: 10.1186/s13015-021-00195-4
Trevor S Frisby, Christopher James Langmead
{"title":"Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution.","authors":"Trevor S Frisby,&nbsp;Christopher James Langmead","doi":"10.1186/s13015-021-00195-4","DOIUrl":"https://doi.org/10.1186/s13015-021-00195-4","url":null,"abstract":"<p><strong>Background: </strong>Directed evolution (DE) is a technique for protein engineering that involves iterative rounds of mutagenesis and screening to search for sequences that optimize a given property, such as binding affinity to a specified target. Unfortunately, the underlying optimization problem is under-determined, and so mutations introduced to improve the specified property may come at the expense of unmeasured, but nevertheless important properties (ex. solubility, thermostability, etc). We address this issue by formulating DE as a regularized Bayesian optimization problem where the regularization term reflects evolutionary or structure-based constraints.</p><p><strong>Results: </strong>We applied our approach to DE to three representative proteins, GB1, BRCA1, and SARS-CoV-2 Spike, and evaluated both evolutionary and structure-based regularization terms. The results of these experiments demonstrate that: (i) structure-based regularization usually leads to better designs (and never hurts), compared to the unregularized setting; (ii) evolutionary-based regularization tends to be least effective; and (iii) regularization leads to better designs because it effectively focuses the search in certain areas of sequence space, making better use of the experimental budget. Additionally, like previous work in Machine learning assisted DE, we find that our approach significantly reduces the experimental burden of DE, relative to model-free methods.</p><p><strong>Conclusion: </strong>Introducing regularization into a Bayesian ML-assisted DE framework alters the exploratory patterns of the underlying optimization routine, and can shift variant selections towards those with a range of targeted and desirable properties. In particular, we find that structure-based regularization often improves variant selection compared to unregularized approaches, and never hurts.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"13"},"PeriodicalIF":1.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-021-00195-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39141819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using the longest run subsequence problem within homology-based scaffolding. 在基于同构的脚手架中使用最长运行子序列问题。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-06-28 DOI: 10.1186/s13015-021-00191-8
Sven Schrinner, Manish Goel, Michael Wulfert, Philipp Spohr, Korbinian Schneeberger, Gunnar W Klau
{"title":"Using the longest run subsequence problem within homology-based scaffolding.","authors":"Sven Schrinner,&nbsp;Manish Goel,&nbsp;Michael Wulfert,&nbsp;Philipp Spohr,&nbsp;Korbinian Schneeberger,&nbsp;Gunnar W Klau","doi":"10.1186/s13015-021-00191-8","DOIUrl":"https://doi.org/10.1186/s13015-021-00191-8","url":null,"abstract":"<p><p>Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"11"},"PeriodicalIF":1.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39137713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Bourque distances for mutation trees of cancers. 癌症突变树的布尔克距离。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-06-10 DOI: 10.1186/s13015-021-00188-3
Katharina Jahn, Niko Beerenwinkel, Louxin Zhang
{"title":"The Bourque distances for mutation trees of cancers.","authors":"Katharina Jahn,&nbsp;Niko Beerenwinkel,&nbsp;Louxin Zhang","doi":"10.1186/s13015-021-00188-3","DOIUrl":"https://doi.org/10.1186/s13015-021-00188-3","url":null,"abstract":"<p><strong>Background: </strong>Mutation trees are rooted trees in which nodes are of arbitrary degree and labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson-Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation trees inferred with different methods or for different patients often contain different sets of mutation labels.</p><p><strong>Results: </strong>We generalize the Robinson-Foulds distance into a set of distance metrics called Bourque distances for comparing mutation trees. We show the basic version of the Bourque distance for mutation trees can be computed in linear time. We also make a connection between the Robinson-Foulds distance and the nearest neighbor interchange distance.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"9"},"PeriodicalIF":1.0,"publicationDate":"2021-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-021-00188-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39080414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The energy-spectrum of bicompatible sequences. 双相容序列的能谱。
IF 1 4区 生物学
Algorithms for Molecular Biology Pub Date : 2021-06-01 DOI: 10.1186/s13015-021-00187-4
Fenix W Huang, Christopher L Barrett, Christian M Reidys
{"title":"The energy-spectrum of bicompatible sequences.","authors":"Fenix W Huang,&nbsp;Christopher L Barrett,&nbsp;Christian M Reidys","doi":"10.1186/s13015-021-00187-4","DOIUrl":"https://doi.org/10.1186/s13015-021-00187-4","url":null,"abstract":"<p><strong>Background: </strong>Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences, which satisfy the base-pairing constraints of a given RNA structure, play an important role in the context of neutral evolution. Sequences that are simultaneously compatible with two given structures (bicompatible sequences), are beacons in phenotypic transitions, induced by erroneously replicating populations of RNA sequences. RNA riboswitches, which are capable of expressing two distinct secondary structures without changing the underlying sequence, are one example of bicompatible sequences in living organisms.</p><p><strong>Results: </strong>We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The sequence sampler employs a dynamic programming routine whose time complexity is polynomial when assuming the maximum number of exposed vertices, [Formula: see text], is a constant. The parameter [Formula: see text] depends on the two structures and can be very large. We introduce a novel topological framework encapsulating the relations between loops that sheds light on the understanding of [Formula: see text]. Based on this framework, we give an algorithm to sample sequences with minimum [Formula: see text] on a particular topologically classified case as well as giving hints to the solution in the other cases. As a result, we utilize our sequence sampler to study some established riboswitches.</p><p><strong>Conclusion: </strong>Our analysis of riboswitch sequences shows that a pair of structures needs to satisfy key properties in order to facilitate phenotypic transitions and that pairs of random structures are unlikely to do so. Our analysis observes a distinct signature of riboswitch sequences, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure. Our free software is available at: https://github.com/FenixHuang667/Bifold .</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"7"},"PeriodicalIF":1.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-021-00187-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39051451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信