Maximilian Radtke, Johanna Moch, Julia Hentschel, Isabell Schumann
{"title":"altAFplotter: a web app for reliable UPD detection in NGS diagnostics","authors":"Maximilian Radtke, Johanna Moch, Julia Hentschel, Isabell Schumann","doi":"10.1186/s12859-024-05922-3","DOIUrl":"https://doi.org/10.1186/s12859-024-05922-3","url":null,"abstract":"The detection of uniparental disomies (the inheritance of both chromosome homologues from a single parent, UPDs) is not part of most standard or commercial NGS-pipelines in human genetics and thus a common gap in NGS diagnostics. To address this we developed a tool for UPD-detection based on panel or exome data which is easy to use and publicly available. The app is freely available at https://altafplotter.uni-leipzig.de/ and implemented in Python, using the Streamlit framework for data science web apps. It utilizes bcftools and tabix for processing vcf files. The source code is available at https://github.com/HUGLeipzig/altafplotter and can be used to host your own instance of the tool. We believe the app to be a great benefit for research and diagnostic labs, which struggle identifying and interpreting UPDs in their NGS diagnostic setup. The information provided allows a quick interpretation of the results and thus is suitable for usage in a high throughput manner by clinicians and biologists.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PopMLvis: a tool for analysis and visualization of population structure using genotype data from genome-wide association studies","authors":"Mohamed Elshrif, Keivin Isufaj, Khalid Kunji, Mohamad Saad","doi":"10.1186/s12859-024-05908-1","DOIUrl":"https://doi.org/10.1186/s12859-024-05908-1","url":null,"abstract":"One of the aims of population genetics is to identify genetic differences/similarities among individuals of multiple ancestries. Many approaches including principal component analysis, clustering, and maximum likelihood techniques can be used to assign individuals to a given ancestry based on their genetic makeup. Although there are several tools that implement such algorithms, there is a lack of interactive visual platforms to run a variety of algorithms in one place. Therefore, we developed PopMLvis, a platform that offers an interactive environment to visualize genetic similarity data using several algorithms, and generate figures that can be easily integrated into scientific articles.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"100 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BayesianSSA: a Bayesian statistical model based on structural sensitivity analysis for predicting responses to enzyme perturbations in metabolic networks","authors":"Shion Hosoda, Hisashi Iwata, Takuya Miura, Maiko Tanabe, Takashi Okada, Atsushi Mochizuki, Miwa Sato","doi":"10.1186/s12859-024-05921-4","DOIUrl":"https://doi.org/10.1186/s12859-024-05921-4","url":null,"abstract":"Chemical bioproduction has attracted attention as a key technology in a decarbonized society. In computational design for chemical bioproduction, it is necessary to predict changes in metabolic fluxes when up-/down-regulating enzymatic reactions, that is, responses of the system to enzyme perturbations. Structural sensitivity analysis (SSA) was previously developed as a method to predict qualitative responses to enzyme perturbations on the basis of the structural information of the reaction network. However, the network structural information can sometimes be insufficient to predict qualitative responses unambiguously, which is a practical issue in bioproduction applications. To address this, in this study, we propose BayesianSSA, a Bayesian statistical model based on SSA. BayesianSSA extracts environmental information from perturbation datasets collected in environments of interest and integrates it into SSA predictions. We applied BayesianSSA to synthetic and real datasets of the central metabolic pathway of Escherichia coli. Our result demonstrates that BayesianSSA can successfully integrate environmental information extracted from perturbation data into SSA predictions. In addition, the posterior distribution estimated by BayesianSSA can be associated with the known pathway reported to enhance succinate export flux in previous studies. We believe that BayesianSSA will accelerate the chemical bioproduction process and contribute to advancements in the field.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"182 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeremia Gunawan Adhisantoso, Tim Körner, Fabian Müntefering, Jörn Ostermann, Jan Voges
{"title":"HiCMC: High-Efficiency Contact Matrix Compressor","authors":"Yeremia Gunawan Adhisantoso, Tim Körner, Fabian Müntefering, Jörn Ostermann, Jan Voges","doi":"10.1186/s12859-024-05907-2","DOIUrl":"https://doi.org/10.1186/s12859-024-05907-2","url":null,"abstract":"Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"60 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah E Fumagalli, Sean Smith, Tigran Ghazanchyan, Douglas Meyer, Rahul Paul, Collin Campbell, Luis Santana-Quintero, Anton Golikov, Juan Ibla, Haim Bar, Anton A Komar, Ryan C Hunt, Brian Lin, Michael DiCuccio, Chava Kimchi-Sarfaty
{"title":"Mouse embryo CoCoPUTs: novel murine transcriptomic-weighted usage website featuring multiple strains, tissues, and stages.","authors":"Sarah E Fumagalli, Sean Smith, Tigran Ghazanchyan, Douglas Meyer, Rahul Paul, Collin Campbell, Luis Santana-Quintero, Anton Golikov, Juan Ibla, Haim Bar, Anton A Komar, Ryan C Hunt, Brian Lin, Michael DiCuccio, Chava Kimchi-Sarfaty","doi":"10.1186/s12859-024-05906-3","DOIUrl":"10.1186/s12859-024-05906-3","url":null,"abstract":"<p><p>Mouse (Mus musculus) models have been heavily utilized in developmental biology research to understand mammalian embryonic development, as mice share many genetic, physiological, and developmental characteristics with humans. New explorations into the integration of temporal (stage-specific) and transcriptional (tissue-specific) data have expanded our knowledge of mouse embryo tissue-specific gene functions. To better understand the substantial impact of synonymous mutational variations in the cell-state-specific transcriptome on a tissue's codon and codon pair usage landscape, we have established a novel resource-Mouse Embryo Codon and Codon Pair Usage Tables (Mouse Embryo CoCoPUTs). This webpage not only offers codon and codon pair usage, but also GC, dinucleotide, and junction dinucleotide usage, encompassing four strains, 15 murine embryonic tissue groups, 18 Theiler stages, and 26 embryonic days. Here, we leverage Mouse Embryo CoCoPUTs and employ the use of heatmaps to depict usage changes over time and a comparison to human usage for each strain and embryonic time point, highlighting unique differences and similarities. The usage similarities found between mouse and human central nervous system data highlight the translation for projects leveraging mouse models. Data for this analysis can be directly retrieved from Mouse Embryo CoCoPUTs. This cutting-edge resource plays a crucial role in deciphering the complex interplay between usage patterns and embryonic development, offering valuable insights into variation across diverse tissues, strains, and stages. Its applications extend across multiple domains, with notable advantages for biotherapeutic development, where optimizing codon usage can enhance protein expression; one can compare strains, tissues, and mouse embryonic stages in one query. Additionally, Mouse Embryo CoCoPUTs holds great potential in the field of tissue-specific genetic engineering, providing insights for tailoring gene expression to specific tissues for targeted interventions. Furthermore, this resource may enhance our understanding of the nuanced connections between usage biases and tissue-specific gene function, contributing to the development of more accurate predictive models for genetic disorders.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"294"},"PeriodicalIF":2.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142145048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Michael O Ranola, Carolyn Horton, Tina Pesaran, Shawn Fayer, Lea M Starita, Brian H Shirts
{"title":"Assigning credit where it is due: an information content score to capture the clinical value of multiplexed assays of variant effect.","authors":"John Michael O Ranola, Carolyn Horton, Tina Pesaran, Shawn Fayer, Lea M Starita, Brian H Shirts","doi":"10.1186/s12859-024-05920-5","DOIUrl":"10.1186/s12859-024-05920-5","url":null,"abstract":"<p><strong>Background: </strong>A variant can be pathogenic or benign with relation to a human disease. Current classification categories from benign to pathogenic reflect a probabilistic summary of the current understanding. A primary metric of clinical utility for multiplexed assays of variant effect (MAVE) is the number of variants that can be reclassified from uncertain significance (VUS). However, a gap in this measure of utility is that it underrepresents the information gained from MAVEs. The aim of this study was to develop an improved quantification metric for MAVE utility. We propose adopting an information content approach that includes data that does not reclassify variants will better reflect true information gain. We adopted an information content approach to evaluate the information gain, in bits, for MAVEs of BRCA1, PTEN, and TP53. Here, one bit represents the amount of information required to completely classify a single variant starting from no information.</p><p><strong>Results: </strong>BRCA1 MAVEs produced a total of 831.2 bits of information, 6.58% of the total missense information in BRCA1 and a 22-fold increase over the information that only contributed to VUS reclassification. PTEN MAVEs produced 2059.6 bits of information which represents 32.8% of the total missense information in PTEN and an 85-fold increase over the information that contributed to VUS reclassification. TP53 MAVEs produced 277.8 bits of information which represents 6.22% of the total missense information in TP53 and a 3.5-fold increase over the information that contributed to VUS reclassification.</p><p><strong>Conclusions: </strong>An information content approach will more accurately portray information gained through MAVE mapping efforts than by counting the number of variants reclassified. This information content approach may also help define the impact of guideline changes that modify the information definitions used to classify groups of variants.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"295"},"PeriodicalIF":2.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142145037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephan Weißbach, Jonas Milkovits, Stefan Pastore, Martin Heine, Susanne Gerber, Hristo Todorov
{"title":"Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain.","authors":"Stephan Weißbach, Jonas Milkovits, Stefan Pastore, Martin Heine, Susanne Gerber, Hristo Todorov","doi":"10.1186/s12859-024-05919-y","DOIUrl":"10.1186/s12859-024-05919-y","url":null,"abstract":"<p><strong>Background: </strong>Gene expression and alternative splicing are strictly regulated processes that shape brain development and determine the cellular identity of differentiated neural cell populations. Despite the availability of multiple valuable datasets, many functional implications, especially those related to alternative splicing, remain poorly understood. Moreover, neuroscientists working primarily experimentally often lack the bioinformatics expertise required to process alternative splicing data and produce meaningful and interpretable results. Notably, re-analyzing publicly available datasets and integrating them with in-house data can provide substantial novel insights. However, such analyses necessitate developing harmonized data handling and processing pipelines which in turn require considerable computational resources and in-depth bioinformatics expertise.</p><p><strong>Results: </strong>Here, we present Cortexa-a comprehensive web portal that incorporates RNA-sequencing datasets from the mouse cerebral cortex (longitudinal or cell-specific) and the hippocampus. Cortexa facilitates understandable visualization of the expression and alternative splicing patterns of individual genes. Our platform provides SplicePCA-a tool that allows users to integrate their alternative splicing dataset and compare it to cell-specific or developmental neocortical splicing patterns. All standardized gene expression and alternative splicing datasets can be downloaded for further in-depth downstream analysis without the need for extensive preprocessing.</p><p><strong>Conclusions: </strong>Cortexa provides a robust and readily available resource for unraveling the complexity of gene expression and alternative splicing regulatory processes in the mouse brain. The data portal is available at https://cortexa-rna.com/.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"293"},"PeriodicalIF":2.9,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142139221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lan-Yun Chang, Ting-Yi Hao, Wei-Jie Wang, Chun-Yu Lin
{"title":"Inference of single-cell network using mutual information for scRNA-seq data analysis.","authors":"Lan-Yun Chang, Ting-Yi Hao, Wei-Jie Wang, Chun-Yu Lin","doi":"10.1186/s12859-024-05895-3","DOIUrl":"10.1186/s12859-024-05895-3","url":null,"abstract":"<p><strong>Background: </strong>With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging.</p><p><strong>Results: </strong>We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property.</p><p><strong>Conclusions: </strong>SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 Suppl 2","pages":"292"},"PeriodicalIF":2.9,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142139222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siddhartha G Jena, Archit Verma, Barbara E Engelhardt
{"title":"Answering open questions in biology using spatial genomics and structured methods.","authors":"Siddhartha G Jena, Archit Verma, Barbara E Engelhardt","doi":"10.1186/s12859-024-05912-5","DOIUrl":"10.1186/s12859-024-05912-5","url":null,"abstract":"<p><p>Genomics methods have uncovered patterns in a range of biological systems, but obscure important aspects of cell behavior: the shapes, relative locations, movement, and interactions of cells in space. Spatial technologies that collect genomic or epigenomic data while preserving spatial information have begun to overcome these limitations. These new data promise a deeper understanding of the factors that affect cellular behavior, and in particular the ability to directly test existing theories about cell state and variation in the context of morphology, location, motility, and signaling that could not be tested before. Rapid advancements in resolution, ease-of-use, and scale of spatial genomics technologies to address these questions also require an updated toolkit of statistical methods with which to interrogate these data. We present a framework to respond to this new avenue of research: four open biological questions that can now be answered using spatial genomics data paired with methods for analysis. We outline spatial data modalities for each open question that may yield specific insights, discuss how conflicting theories may be tested by comparing the data to conceptual models of biological behavior, and highlight statistical and machine learning-based tools that may prove particularly helpful to recover biological understanding.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"291"},"PeriodicalIF":2.9,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375982/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142131751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah Asım Emül, Mehmet Arif Ergün, Rumeysa Aslıhan Ertürk, Ömer Çinal, Mehmet Baysan
{"title":"VCF observer: a user-friendly software tool for preliminary VCF file analysis and comparison.","authors":"Abdullah Asım Emül, Mehmet Arif Ergün, Rumeysa Aslıhan Ertürk, Ömer Çinal, Mehmet Baysan","doi":"10.1186/s12859-024-05860-0","DOIUrl":"10.1186/s12859-024-05860-0","url":null,"abstract":"<p><strong>Background: </strong>Advancements over the past decade in DNA sequencing technology and computing power have created the potential to revolutionize medicine. There has been a marked increase in genetic data available, allowing for the advancement of areas such as personalized medicine. A crucial type of data in this context is genetic variant data which is stored in variant call format (VCF) files. However, the rapid growth in genomics has presented challenges in analyzing and comparing VCF files.</p><p><strong>Results: </strong>In response to the limitations of existing tools, this paper introduces a novel web application that provides a user-friendly solution for VCF file analyses and comparisons. The software tool enables researchers and clinicians to perform high-level analysis with ease and enhances productivity. The application's interface allows users to conveniently upload, analyze, and visualize their VCF files using simple drag-and-drop and point-and-click operations. Essential visualizations such as Venn diagrams, clustergrams, and precision-recall plots are provided to users. A key feature of the application is its support for metadata-based file grouping, accomplished through flexible data matrix uploads, streamlining organization and analysis of user-defined categories. Additionally, the application facilitates standardized benchmarking of VCF files by integrating user-provided ground truth regions and variant lists.</p><p><strong>Conclusions: </strong>By providing a user-friendly interface and supporting essential visualizations, this software enhances the accessibility of VCF file analysis and assists researchers and clinicians in their scientific inquiries.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"290"},"PeriodicalIF":2.9,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}