Hua-Ting Yao, Bertrand Marchand, Sarah J Berkemer, Yann Ponty, Sebastian Will
{"title":"Infrared: a declarative tree decomposition-powered framework for bioinformatics.","authors":"Hua-Ting Yao, Bertrand Marchand, Sarah J Berkemer, Yann Ponty, Sebastian Will","doi":"10.1186/s13015-024-00258-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations.</p><p><strong>Methods: </strong>We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency.</p><p><strong>Results: </strong>Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework-together with our novel results-underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations.</p><p><strong>Availability: </strong>Infrared is available at https://amibio.gitlabpages.inria.fr/Infrared with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"19 1","pages":"13"},"PeriodicalIF":1.5000,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10943887/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-024-00258-2","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations.
Methods: We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency.
Results: Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework-together with our novel results-underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations.
Availability: Infrared is available at https://amibio.gitlabpages.inria.fr/Infrared with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.
期刊介绍:
Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning.
Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms.
Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.