Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P Pissis, Jakub Radoszewski, Michelle Sweering, Wiktor Zuba
{"title":"Pangenome comparison via ED strings.","authors":"Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P Pissis, Jakub Radoszewski, Michelle Sweering, Wiktor Zuba","doi":"10.3389/fbinf.2024.1397036","DOIUrl":"10.3389/fbinf.2024.1397036","url":null,"abstract":"<p><strong>Introduction: </strong>An elastic-degenerate (ED) string is a sequence of sets of strings. It can also be seen as a directed acyclic graph whose edges are labeled by strings. The notion of ED strings was introduced as a simple alternative to variation and sequence graphs for representing a pangenome, that is, a collection of genomic sequences to be analyzed jointly or to be used as a reference.</p><p><strong>Methods: </strong>In this study, we define notions of <i>matching statistics</i> of two ED strings as similarity measures between pangenomes and, consequently infer a corresponding distance measure. We then show that both measures can be computed efficiently, in both theory and practice, by employing the <i>intersection graph</i> of two ED strings.</p><p><strong>Results: </strong>We also implemented our methods as a software tool for pangenome comparison and evaluated their efficiency and effectiveness using both synthetic and real datasets.</p><p><strong>Discussion: </strong>As for efficiency, we compare the runtime of the intersection graph method against the classic product automaton construction showing that the intersection graph is faster by up to one order of magnitude. For showing effectiveness, we used real SARS-CoV-2 datasets and our matching statistics similarity measure to reproduce a well-established clade classification of SARS-CoV-2, thus demonstrating that the classification obtained by our method is in accordance with the existing one.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1397036"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QSPRmodeler - An open source application for molecular predictive analytics.","authors":"Rafał A Bachorz, Damian Nowak, Marcin Ratajewski","doi":"10.3389/fbinf.2024.1441024","DOIUrl":"10.3389/fbinf.2024.1441024","url":null,"abstract":"<p><p>The drug design process can be successfully supported using a variety of <i>in silico</i> methods. Some of these are oriented toward molecular property prediction, which is a key step in the early drug discovery stage. Before experimental validation, drug candidates are usually compared with known experimental data. Technically, this can be achieved using machine learning approaches, in which selected experimental data are used to train the predictive models. The proposed Python software is designed for this purpose. It supports the entire workflow of molecular data processing, starting from raw data preparation followed by molecular descriptor creation and machine learning model training. The predictive capabilities of the resulting models were carefully validated internally and externally. These models can be easily applied to new compounds, including within more complex workflows involving generative approaches.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1441024"},"PeriodicalIF":2.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The quantum hypercube as a k-mer graph.","authors":"Gustavo Becerra-Gavino, Liliana Ibeth Barbosa-Santillan","doi":"10.3389/fbinf.2024.1401223","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1401223","url":null,"abstract":"<p><p>The application of quantum principles in computing has garnered interest since the 1980s. Today, this concept is not only theoretical, but we have the means to design and execute techniques that leverage the quantum principles to perform calculations. The emergence of the quantum walk search technique exemplifies the practical application of quantum concepts and their potential to revolutionize information technologies. It promises to be versatile and may be applied to various problems. For example, the coined quantum walk search allows for identifying a marked item in a combinatorial search space, such as the quantum hypercube. The quantum hypercube organizes the qubits such that the qubit states represent the vertices and the edges represent the transitions to the states differing by one qubit state. It offers a novel framework to represent k-mer graphs in the quantum realm. Thus, the quantum hypercube facilitates the exploitation of parallelism, which is made possible through superposition and entanglement to search for a marked k-mer. However, as found in the analysis of the results, the search is only sometimes successful in hitting the target. Thus, through a meticulous examination of the quantum walk search circuit outcomes, evaluating what input-target combinations are useful, and a visionary exploration of DNA k-mer search, this paper opens the door to innovative possibilities, laying down the groundwork for further research to bridge the gap between theoretical conjecture in quantum computing and a tangible impact in bioinformatics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1401223"},"PeriodicalIF":2.8,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11425167/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Austin Swart, Ron Caspi, Suzanne Paley, Peter D Karp
{"title":"Visual analysis of multi-omics data.","authors":"Austin Swart, Ron Caspi, Suzanne Paley, Peter D Karp","doi":"10.3389/fbinf.2024.1395981","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1395981","url":null,"abstract":"<p><p>We present a tool for multi-omics data analysis that enables simultaneous visualization of up to four types of omics data on organism-scale metabolic network diagrams. The tool's interactive web-based metabolic charts depict the metabolic reactions, pathways, and metabolites of a single organism as described in a metabolic pathway database for that organism; the charts are constructed using automated graphical layout algorithms. The multi-omics visualization facility paints each individual omics dataset onto a different \"visual channel\" of the metabolic-network diagram. For example, a transcriptomics dataset might be displayed by coloring the reaction arrows within the metabolic chart, while a companion proteomics dataset is displayed as reaction arrow thicknesses, and a complementary metabolomics dataset is displayed as metabolite node colors. Once the network diagrams are painted with omics data, semantic zooming provides more details within the diagram as the user zooms in. Datasets containing multiple time points can be displayed in an animated fashion. The tool will also graph data values for individual reactions or metabolites designated by the user. The user can interactively adjust the mapping from data value ranges to the displayed colors and thicknesses to provide more informative diagrams.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1395981"},"PeriodicalIF":2.8,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Catriona Miller, Theo Portlock, Denis M Nyaga, Justin M O'Sullivan
{"title":"A review of model evaluation metrics for machine learning in genetics and genomics.","authors":"Catriona Miller, Theo Portlock, Denis M Nyaga, Justin M O'Sullivan","doi":"10.3389/fbinf.2024.1457619","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1457619","url":null,"abstract":"<p><p>Machine learning (ML) has shown great promise in genetics and genomics where large and complex datasets have the potential to provide insight into many aspects of disease risk, pathogenesis of genetic disorders, and prediction of health and wellbeing. However, with this possibility there is a responsibility to exercise caution against biases and inflation of results that can have harmful unintended impacts. Therefore, researchers must understand the metrics used to evaluate ML models which can influence the critical interpretation of results. In this review we provide an overview of ML metrics for clustering, classification, and regression and highlight the advantages and disadvantages of each. We also detail common pitfalls that occur during model evaluation. Finally, we provide examples of how researchers can assess and utilise the results of ML models, specifically from a genomics perspective.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1457619"},"PeriodicalIF":2.8,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Molecular docking and molecular dynamic simulation studies to identify potential terpenes against Internalin A protein of <i>Listeria monocytogenes</i>.","authors":"Deepasree K, Subhashree Venugopal","doi":"10.3389/fbinf.2024.1463750","DOIUrl":"10.3389/fbinf.2024.1463750","url":null,"abstract":"<p><strong>Introduction: </strong>Ever since the outbreak of listeriosis and other related illnesses caused by the dreadful pathogen <i>Listeria monocytogenes</i>, the lives of immunocompromised individuals have been at risk.</p><p><strong>Objectives and methods: </strong>The main goal of this study is to comprehend the potential of terpenes, a major class of secondary metabolites in inhibiting one of the disease-causing protein Internalin A (InlA) of the pathogen via <i>in silico</i> approaches.</p><p><strong>Results: </strong>The best binding affinity value of -9.5 kcal/mol was observed for Bipinnatin and Epispongiadiol according to the molecular docking studies. The compounds were further subjected to ADMET and biological activity estimation which confirmed their good pharmacokinetic properties and antibacterial activity.</p><p><strong>Discussion: </strong>Molecular dynamic simulation for a timescale of 100 ns finally revealed Epispongiadiol to be a promising drug-like compound that could possibly pave the way to the treatment of this disease.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1463750"},"PeriodicalIF":2.8,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11412924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PhIP-Seq: methods, applications and challenges.","authors":"Ziru Huang, Samarappuli Mudiyanselage Savini Gunarathne, Wenwen Liu, Yuwei Zhou, Yuqing Jiang, Shiqi Li, Jian Huang","doi":"10.3389/fbinf.2024.1424202","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1424202","url":null,"abstract":"<p><p>Phage-immunoprecipitation sequencing (PhIP-Seq) technology is an innovative, high-throughput antibody detection method. It enables comprehensive analysis of individual antibody profiles. This technology shows great potential, particularly in exploring disease mechanisms and immune responses. Currently, PhIP-Seq has been successfully applied in various fields, such as the exploration of biomarkers for autoimmune diseases, vaccine development, and allergen detection. A variety of bioinformatics tools have facilitated the development of this process. However, PhIP-Seq technology still faces many challenges and has room for improvement. Here, we review the methods, applications, and challenges of PhIP-Seq and discuss its future directions in immunological research and clinical applications. With continuous progress and optimization, PhIP-Seq is expected to play an even more important role in future biomedical research, providing new ideas and methods for disease prevention, diagnosis, and treatment.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1424202"},"PeriodicalIF":2.8,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11408297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rvisdiff: An R package for interactive visualization of differential expression.","authors":"David Barrios, Carlos Prieto","doi":"10.3389/fbinf.2024.1349205","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1349205","url":null,"abstract":"<p><p>Rvisdiff is an R/Bioconductor package that generates an interactive interface for the interpretation of differential expression results. It creates a local web page that enables the exploration of statistical analysis results through the generation of auto-analytical visualizations. Users can explore the differential expression results and the source expression data interactively in the same view. As input, the package supports the results of popular differential expression packages such as DESeq2, edgeR, and limma. As output, the package generates a local HTML page that can be easily viewed in a web browser. Rvisdiff is freely available at https://bioconductor.org/packages/Rvisdiff/.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1349205"},"PeriodicalIF":2.8,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11402892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hermenegildo Taboada-Castro, Alfredo José Hernández-Álvarez, Juan Miguel Escorcia-Rodríguez, Julio Augusto Freyre-González, Edgardo Galán-Vásquez, Sergio Encarnación-Guevara
{"title":"<i>Rhizobium etli</i> CFN42 and <i>Sinorhizobium meliloti</i> 1021 bioinformatic transcriptional regulatory networks from culture and symbiosis.","authors":"Hermenegildo Taboada-Castro, Alfredo José Hernández-Álvarez, Juan Miguel Escorcia-Rodríguez, Julio Augusto Freyre-González, Edgardo Galán-Vásquez, Sergio Encarnación-Guevara","doi":"10.3389/fbinf.2024.1419274","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1419274","url":null,"abstract":"<p><p><i>Rhizobium etli</i> CFN42 proteome-transcriptome mixed data of exponential growth and nitrogen-fixing bacteroids, as well as <i>Sinorhizobium meliloti</i> 1021 transcriptome data of growth and nitrogen-fixing bacteroids, were integrated into transcriptional regulatory networks (TRNs). The one-step construction network consisted of a matrix-clustering analysis of matrices of the gene profile and all matrices of the transcription factors (TFs) of their genome. The networks were constructed with the prediction of regulatory network application of the RhizoBindingSites database (http://rhizobindingsites.ccg.unam.mx/). The deduced free-living <i>Rhizobium etli</i> network contained 1,146 genes, including 380 TFs and 12 sigma factors. In addition, the bacteroid <i>R. etli</i> CFN42 network contained 884 genes, where 364 were TFs, and 12 were sigma factors, whereas the deduced free-living <i>Sinorhizobium meliloti</i> 1021 network contained 643 genes, where 259 were TFs and seven were sigma factors, and the bacteroid <i>Sinorhizobium meliloti</i> 1021 network contained 357 genes, where 210 were TFs and six were sigma factors. The similarity of these deduced condition-dependent networks and the biological <i>E. coli</i> and <i>B. subtilis</i> independent condition networks segregates from the random Erdös-Rényi networks. Deduced networks showed a low average clustering coefficient. They were not scale-free, showing a gradually diminishing hierarchy of TFs in contrast to the hierarchy role of the sigma factor <i>rpoD</i> in the <i>E. coli</i> K12 network. For rhizobia networks, partitioning the genome in the chromosome, chromids, and plasmids, where essential genes are distributed, and the symbiotic ability that is mostly coded in plasmids, may alter the structure of these deduced condition-dependent networks. It provides potential TF gen-target relationship data for constructing regulons, which are the basic units of a TRN.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1419274"},"PeriodicalIF":2.8,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11387232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design principles for molecular animation.","authors":"Stuart G Jantzen, Gaël McGill, Jodie Jenkinson","doi":"10.3389/fbinf.2024.1353807","DOIUrl":"10.3389/fbinf.2024.1353807","url":null,"abstract":"<p><p>Molecular visualization is a powerful way to represent the complex structure of molecules and their higher order assemblies, as well as the dynamics of their interactions. Although conventions for depicting static molecular structures and complexes are now well established and guide the viewer's attention to specific aspects of structure and function, little attention and design classification has been devoted to how molecular motion is depicted. As we continue to probe and discover how molecules move - including their internal flexibility, conformational changes and dynamic associations with binding partners and environments - we are faced with difficult design challenges that are relevant to molecular visualizations both for the scientific community and students of cell and molecular biology. To facilitate these design decisions, we have identified twelve molecular animation design principles that are important to consider when creating molecular animations. Many of these principles pertain to misconceptions that students have primarily regarding the agency of molecules, while others are derived from visual treatments frequently observed in molecular animations that may promote misconceptions. For each principle, we have created a pair of molecular animations that exemplify the principle by depicting the same content in the presence and absence of that design approach. Although not intended to be prescriptive, we hope this set of design principles can be used by the scientific, education, and scientific visualization communities to facilitate and improve the pedagogical effectiveness of molecular animation.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1353807"},"PeriodicalIF":2.8,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}