Bioinformatics advancesPub Date : 2024-08-30eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae130
Yael Kupershmidt, Simon Kasif, Roded Sharan
{"title":"SPIDER: constructing cell-type-specific protein-protein interaction networks.","authors":"Yael Kupershmidt, Simon Kasif, Roded Sharan","doi":"10.1093/bioadv/vbae130","DOIUrl":"https://doi.org/10.1093/bioadv/vbae130","url":null,"abstract":"<p><strong>Motivation: </strong>Protein-protein interactions (PPIs) play essential roles in the buildup of cellular machinery and provide the skeleton for cellular signaling. However, these biochemical roles are context dependent and interactions may change across cell type, time, and space. In contrast, PPI detection assays are run in a single condition that may not even be an endogenous condition of the organism, resulting in static networks that do not reflect full cellular complexity. Thus, there is a need for computational methods to predict cell-type-specific interactions.</p><p><strong>Results: </strong>Here we present SPIDER (Supervised Protein Interaction DEtectoR), a graph attention-based model for predicting cell-type-specific PPI networks. In contrast to previous attempts at this problem, which were unsupervised in nature, our model's training is guided by experimentally measured cell-type-specific networks, enhancing its performance. We evaluate our method using experimental data of cell-type-specific networks from both humans and mice, and show that it outperforms current approaches by a large margin. We further demonstrate the ability of our method to generalize the predictions to datasets of tissues lacking prior PPI experimental data. We leverage the networks predicted by the model to facilitate the identification of tissue-specific disease genes.</p><p><strong>Availability and implementation: </strong>Our code and data are available at https://github.com/Kuper994/SPIDER.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11438548/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-29eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae126
Virginie Grosboillot, Anna Dragoš
{"title":"synphage: a pipeline for phage genome synteny graphics focused on gene conservation.","authors":"Virginie Grosboillot, Anna Dragoš","doi":"10.1093/bioadv/vbae126","DOIUrl":"10.1093/bioadv/vbae126","url":null,"abstract":"<p><strong>Motivation: </strong>Visualization and comparison of genome maps of bacteriophages can be very effective, but none of the tools available on the market allow visualization of gene conservation between multiple sequences at a glance. In addition, most bioinformatic tools running locally are command line only, making them hard to setup, debug, and monitor.</p><p><strong>Results: </strong>To address these motivations, we developed synphage, an easy-to-use and intuitive tool to generate synteny diagrams from GenBank files. This software has a user-friendly interface and uses metadata to monitor the progress and success of the data transformation process. The output plot features colour-coded genes according to their degree of conservation among the group of displayed sequences. The strength of synphage lies also in its modularity and the ability to generate multiple plots with different configurations without having to re-process all the data. In conclusion, synphage reduces the bioinformatic workload of users and allows them to focus on analysis, the most impactful area of their work.</p><p><strong>Availability and implementation: </strong>The synphage tool is implemented in the Python language and is available from the GitHub repository at https://github.com/vestalisvirginis/synphage. This software is released under an Apache-2.0 licence. A PyPI synphage package is available at https://pypi.org/project/synphage/ and a containerized version is available at https://hub.docker.com/r/vestalisvirginis/synphage. Contributions to the software are welcome whether it is reporting a bug or proposing new features and the contribution guidelines are available at https://github.com/vestalisvirginis/synphage/blob/main/CONTRIBUTING.md.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-29eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae129
Fernando Sola, Daniel Ayala, Marina Pulido, Rafael Ayala, Lorena López-Cerero, Inma Hernández, David Ruiz
{"title":"ginmappeR: an unified approach for integrating gene and protein identifiers across biological sequence databases.","authors":"Fernando Sola, Daniel Ayala, Marina Pulido, Rafael Ayala, Lorena López-Cerero, Inma Hernández, David Ruiz","doi":"10.1093/bioadv/vbae129","DOIUrl":"https://doi.org/10.1093/bioadv/vbae129","url":null,"abstract":"<p><strong>Summary: </strong>The proliferation of biological sequence data, due to developments in molecular biology techniques, has led to the creation of numerous open access databases on gene and protein sequencing. However, the lack of direct equivalence between identifiers across these databases difficults data integration. To address this challenge, we introduce <i>ginmappeR</i>, an integrated R package facilitating the translation of gene and protein identifiers between databases. By providing a unified interface, <i>ginmappeR</i> streamlines the integration of diverse data sources into biological workflows, so it enhances efficiency and user experience.</p><p><strong>Availability and implementation: </strong>from Bioconductor: https://bioconductor.org/packages/ginmappeR.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11387618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-26eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae125
James G Davies, Georgina E Menzies
{"title":"Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.","authors":"James G Davies, Georgina E Menzies","doi":"10.1093/bioadv/vbae125","DOIUrl":"10.1093/bioadv/vbae125","url":null,"abstract":"<p><strong>Motivation: </strong>Benzo[<i>a</i>]pyrene, a notorious DNA-damaging carcinogen, belongs to the family of polycyclic aromatic hydrocarbons commonly found in tobacco smoke. Surprisingly, nucleotide excision repair (NER) machinery exhibits inefficiency in recognizing specific bulky DNA adducts including Benzo[<i>a</i>]pyrene Diol-Epoxide (BPDE), a Benzo[<i>a</i>]pyrene metabolite. While sequence context is emerging as the leading factor linking the inadequate NER response to BPDE adducts, the precise structural attributes governing these disparities remain inadequately understood. We therefore combined the domains of molecular dynamics and machine learning to conduct a comprehensive assessment of helical distortion caused by BPDE-Guanine adducts in multiple gene contexts. Specifically, we implemented a dual approach involving a random forest classification-based analysis and subsequent feature selection to identify precise topological features that may distinguish adduct sites of variable repair capacity. Our models were trained using helical data extracted from duplexes representing both BPDE hotspot and nonhotspot sites within the <i>TP53</i> gene, then applied to sites within <i>TP53</i>, <i>cII</i>, and <i>lacZ</i> genes.</p><p><strong>Results: </strong>We show our optimized model consistently achieved exceptional performance, with accuracy, precision, and f1 scores exceeding 91%. Our feature selection approach uncovered that discernible variance in regional base pair rotation played a pivotal role in informing the decisions of our model. Notably, these disparities were highly conserved among <i>TP53</i> and <i>lacZ</i> duplexes and appeared to be influenced by the regional GC content. As such, our findings suggest that there are indeed conserved topological features distinguishing hotspots and nonhotpot sites, highlighting regional GC content as a potential biomarker for mutation.</p><p><strong>Availability and implementation: </strong>Code for comparing machine learning classifiers and evaluating their performance is available at https://github.com/jdavies24/ML-Classifier-Comparison, and code for analysing DNA structure with Curves+ and Canal using Random Forest is available at https://github.com/jdavies24/ML-classification-of-DNA-trajectories.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-26eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae127
Joseph A Cogan, Natalia Benova, Rene Kuklinkova, James R Boyne, Chinedu A Anene
{"title":"Meta-analysis of RNA interaction profiles of RNA-binding protein using the RBPInper tool.","authors":"Joseph A Cogan, Natalia Benova, Rene Kuklinkova, James R Boyne, Chinedu A Anene","doi":"10.1093/bioadv/vbae127","DOIUrl":"10.1093/bioadv/vbae127","url":null,"abstract":"<p><strong>Motivation: </strong>Recent RNA-centric experimental methods have significantly expanded our knowledge of proteins with known RNA-binding functions. However, the complete regulatory network and pathways for many of these RNA-binding proteins (RBPs) in different cellular contexts remain unknown. Although critical to understanding the role of RBPs in health and disease, experimentally mapping the RBP-RNA interactomes in every single context is an impossible task due the cost and manpower required. Additionally, identifying relevant RNAs bound by RBPs is challenging due to their diverse binding modes and function.</p><p><strong>Results: </strong>To address these challenges, we developed RBP interaction mapper RBPInper an integrative framework that discovers global RBP interactome using statistical data fusion. Experiments on splicing factor proline and glutamine rich (SFPQ) datasets revealed cogent global SFPQ interactome. Several biological processes associated with this interactome were previously linked with SFPQ function. Furthermore, we conducted tests using independent dataset to assess the transferability of the SFPQ interactome to another context. The results demonstrated robust utility in generating interactomes that transfers to unseen cellular context. Overall, RBPInper is a fast and user-friendly method that enables a systems-level understanding of RBP functions by integrating multiple molecular datasets. The tool is designed with a focus on simplicity, minimal dependencies, and straightforward input requirements. This intentional design aims to empower everyday biologists, making it easy for them to incorporate the tool into their research.</p><p><strong>Availability and implementation: </strong>The source code, documentation, and installation instructions as well as results for use case are freely available at https://github.com/AneneLab/RBPInper. A user can easily compile similar datasets for a target RBP.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11374027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-24eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae123
Yinqi Zhao, Qiran Jia, Jesse Goodrich, Burcu Darst, David V Conti
{"title":"An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data.","authors":"Yinqi Zhao, Qiran Jia, Jesse Goodrich, Burcu Darst, David V Conti","doi":"10.1093/bioadv/vbae123","DOIUrl":"10.1093/bioadv/vbae123","url":null,"abstract":"<p><strong>Motivation: </strong>Latent unknown clustering integrating multi-omics data is a novel statistical model designed for multi-omics data analysis. It integrates omics data with exposures and an outcome through a latent cluster, elucidating how exposures influence processes reflected in multi-omics measurements, ultimately affecting an outcome. A significant challenge in multi-omics analysis is the issue of list-wise missingness. To address this, we extend the model to incorporate list-wise missingness within an integrated imputation framework, which can also handle sporadic missingness when necessary.</p><p><strong>Results: </strong>Simulation studies demonstrate that our integrated imputation approach produces consistent and less biased estimates, closely reflecting true underlying values. We applied this model to data from the ISGlobal/ATHLETE \"Exposome Data Challenge Event\" to explore the association between maternal exposure to hexachlorobenzene and childhood body mass index by integrating incomplete proteomics data from 1301 children. The model successfully estimated proteomics profiles for two clusters representing higher and lower body mass index, characterizing the potential profiles linking prenatal hexachlorobenzene levels and childhood body mass index.</p><p><strong>Availability and implementation: </strong>The proposed methods have been implemented in the R package <i>LUCIDus</i>. The source code is available at https://github.com/USCbiostats/LUCIDus.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-22eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae124
Santiago Prochetto, Renata Reinheimer, Georgina Stegmayer
{"title":"evolSOM: An R package for analyzing conservation and displacement of biological variables with self-organizing maps.","authors":"Santiago Prochetto, Renata Reinheimer, Georgina Stegmayer","doi":"10.1093/bioadv/vbae124","DOIUrl":"https://doi.org/10.1093/bioadv/vbae124","url":null,"abstract":"<p><strong>Motivation: </strong>Unraveling the connection between genes and traits is crucial for solving many biological puzzles. Ribonucleic acid molecules and proteins, derived from these genetic instructions, play crucial roles in shaping cell structures, influencing reactions, and guiding behavior. This fundamental biological principle links genetic makeup to observable traits, but integrating and extracting meaningful relationships from this complex, multimodal data present a significant challenge.</p><p><strong>Results: </strong>We introduce evolSOM, a novel R package that allows exploring and visualizing the conservation or displacement of biological variables, easing the integration of phenotypic and genotypic attributes. It enables the projection of multi-dimensional expression profiles onto interpretable two-dimensional grids, aiding in the identification of conserved or displaced genes/phenotypes across multiple conditions. Variables displaced together suggest membership to the same regulatory network, where the nature of the displacement may hold biological significance. The conservation or displacement of variables is automatically calculated and graphically presented by evolSOM. Its user-friendly interface and visualization capabilities enhance the accessibility of complex network analyses.</p><p><strong>Availability and implementation: </strong>The package is open-source under the GPL ( <math><mo>≥</mo></math> 3) and is available at https://github.com/sanprochetto/evolSOM, along with a step-by-step vignette and a full example dataset that can be accessed at https://github.com/sanprochetto/evolSOM/tree/main/inst/extdata.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11361812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-21eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae122
Jose V Die
{"title":"refseqR: an R package for common computational operations with records on RefSeq collection.","authors":"Jose V Die","doi":"10.1093/bioadv/vbae122","DOIUrl":"10.1093/bioadv/vbae122","url":null,"abstract":"<p><strong>Summary: </strong>We introduce refseqR, an R package that offers a user-friendly solution, enabling common computational operations on RefSeq entries (GenBank, NCBI). The package is specifically designed to interact with records curated from the RefSeq database. Most importantly, the interoperability and integration with several Bioconductor objects allow connections to be applied to other projects.</p><p><strong>Availability and implementation: </strong>The package refseqR is implemented in R and published under the MIT open-source license. The source code, documentation, and usage instructions are available on CRAN (https://CRAN.R-project.org/package=refseqR).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-08-20eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae116
Katerina Nastou, Mikaela Koutrouli, Sampo Pyysalo, Lars Juhl Jensen
{"title":"CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes.","authors":"Katerina Nastou, Mikaela Koutrouli, Sampo Pyysalo, Lars Juhl Jensen","doi":"10.1093/bioadv/vbae116","DOIUrl":"https://doi.org/10.1093/bioadv/vbae116","url":null,"abstract":"<p><strong>Motivation: </strong>Despite significant progress in biomedical information extraction, there is a lack of resources for Named Entity Recognition (NER) and Named Entity Normalization (NEN) of protein-containing complexes. Current resources inadequately address the recognition of protein-containing complex names across different organisms, underscoring the crucial need for a dedicated corpus.</p><p><strong>Results: </strong>We introduce the Complex Named Entity Corpus (CoNECo), an annotated corpus for NER and NEN of complexes. CoNECo comprises 1621 documents with 2052 entities, 1976 of which are normalized to Gene Ontology. We divided the corpus into training, development, and test sets and trained both a transformer-based and dictionary-based tagger on them. Evaluation on the test set demonstrated robust performance, with F-scores of 73.7% and 61.2%, respectively. Subsequently, we applied the best taggers for comprehensive tagging of the entire openly accessible biomedical literature.</p><p><strong>Availability and implementation: </strong>All resources, including the annotated corpus, training data, and code, are available to the community through Zenodo https://zenodo.org/records/11263147 and GitHub https://zenodo.org/records/10693653.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IVEA: an integrative variational Bayesian inference method for predicting enhancer-gene regulatory interactions.","authors":"Yasumasa Kimura, Yoshimasa Ono, Kotoe Katayama, Seiya Imoto","doi":"10.1093/bioadv/vbae118","DOIUrl":"10.1093/bioadv/vbae118","url":null,"abstract":"<p><strong>Motivation: </strong>Enhancers play critical roles in cell-type-specific transcriptional control. Despite the identification of thousands of candidate enhancers, unravelling their regulatory relationships with their target genes remains challenging. Therefore, computational approaches are needed to accurately infer enhancer-gene regulatory relationships.</p><p><strong>Results: </strong>In this study, we propose a new method, IVEA, that predicts enhancer-gene regulatory interactions by estimating promoter and enhancer activities. Its statistical model is based on the gene regulatory mechanism of transcriptional bursting, which is characterized by burst size and frequency controlled by promoters and enhancers, respectively. Using transcriptional readouts, chromatin accessibility, and chromatin contact data as inputs, promoter and enhancer activities were estimated using variational Bayesian inference, and the contribution of each enhancer-promoter pair to target gene transcription was calculated. Our analysis demonstrates that the proposed method can achieve high prediction accuracy and provide biologically relevant enhancer-gene regulatory interactions.</p><p><strong>Availability and implementation: </strong>The IVEA code is available on GitHub at https://github.com/yasumasak/ivea. The publicly available datasets used in this study are described in Supplementary Table S4.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}