{"title":"Using deep learning and large protein language models to predict protein-membrane interfaces of peripheral membrane proteins.","authors":"Dimitra Paranou, Alexios Chatzigoulas, Zoe Cournia","doi":"10.1093/bioadv/vbae078","DOIUrl":"10.1093/bioadv/vbae078","url":null,"abstract":"<p><strong>Motivation: </strong>Characterizing interactions at the protein-membrane interface is crucial as abnormal peripheral protein-membrane attachment is involved in the onset of many diseases. However, a limiting factor in studying and understanding protein-membrane interactions is that the membrane-binding domains of peripheral membrane proteins (PMPs) are typically unknown. By applying artificial intelligence techniques in the context of natural language processing (NLP), the accuracy and prediction time for protein-membrane interface analysis can be significantly improved compared to existing methods. Here, we assess whether NLP and protein language models (pLMs) can be used to predict membrane-interacting amino acids for PMPs.</p><p><strong>Results: </strong>We utilize available experimental data and generate protein embeddings from two pLMs (ProtTrans and ESM) to train classifier models. Overall, the results demonstrate the first proof of concept study and the promising potential of using deep learning and pLMs to predict protein-membrane interfaces for PMPs faster, with similar accuracy, and without the need for 3D structural data compared to existing tools.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/zoecournia/pLM-PMI. All data are available in the Supplementary material.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae078"},"PeriodicalIF":2.4,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11572487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-24eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae074
Felicitas Kindel, Sebastian Triesch, Urte Schlüter, Laura Alexandra Randarevitch, Vanessa Reichel-Deland, Andreas P M Weber, Alisandra K Denton
{"title":"Predmoter-cross-species prediction of plant promoter and enhancer regions.","authors":"Felicitas Kindel, Sebastian Triesch, Urte Schlüter, Laura Alexandra Randarevitch, Vanessa Reichel-Deland, Andreas P M Weber, Alisandra K Denton","doi":"10.1093/bioadv/vbae074","DOIUrl":"10.1093/bioadv/vbae074","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying <i>cis</i>-regulatory elements (CREs) is crucial for analyzing gene regulatory networks. Next generation sequencing methods were developed to identify CREs but represent a considerable expenditure for targeted analysis of few genomic loci. Thus, predicting the outputs of these methods would significantly cut costs and time investment.</p><p><strong>Results: </strong>We present Predmoter, a deep neural network that predicts base-wise Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) and histone Chromatin immunoprecipitation DNA-sequencing (ChIP-seq) read coverage for plant genomes. Predmoter uses only the DNA sequence as input. We trained our final model on 21 species for 13 of which ATAC-seq data and for 17 of which ChIP-seq data was publicly available. We evaluated our models on <i>Arabidopsis thaliana</i> and <i>Oryza sativa</i>. Our best models showed accurate predictions in peak position and pattern for ATAC- and histone ChIP-seq. Annotating putatively accessible chromatin regions provides valuable input for the identification of CREs. In conjunction with other <i>in silico</i> data, this can significantly reduce the search space for experimentally verifiable DNA-protein interaction pairs.</p><p><strong>Availability and implementation: </strong>The source code for Predmoter is available at: https://github.com/weberlab-hhu/Predmoter. Predmoter takes a fasta file as input and outputs h5, and optionally bigWig and bedGraph files.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae074"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11150885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141263386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The axes of biology: a novel axes-based network embedding paradigm to decipher the functional mechanisms of the cell.","authors":"Sergio Doria-Belenguer, Alexandros Xenos, Gaia Ceddia, Noël Malod-Dognin, Nataša Pržulj","doi":"10.1093/bioadv/vbae075","DOIUrl":"10.1093/bioadv/vbae075","url":null,"abstract":"<p><strong>Summary: </strong>Common approaches for deciphering biological networks involve network embedding algorithms. These approaches strictly focus on clustering the genes' embedding vectors and interpreting such clusters to reveal the hidden information of the networks. However, the difficulty in interpreting the genes' clusters and the limitations of the functional annotations' resources hinder the identification of the currently unknown cell's functioning mechanisms. We propose a new approach that shifts this functional exploration from the embedding vectors of genes in space to the axes of the space itself. Our methodology better disentangles biological information from the embedding space than the classic gene-centric approach. Moreover, it uncovers new data-driven functional interactions that are unregistered in the functional ontologies, but biologically coherent. Furthermore, we exploit these interactions to define new higher-level annotations that we term Axes-Specific Functional Annotations and validate them through literature curation. Finally, we leverage our methodology to discover evolutionary connections between cellular functions and the evolution of species.</p><p><strong>Availability and implementation: </strong>Data and source code can be accessed at https://gitlab.bsc.es/sdoria/axes-of-biology.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae075"},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11142626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-22eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae069
Christos A Ouzounis
{"title":"Biology's transformation: from observation through experiment to computation.","authors":"Christos A Ouzounis","doi":"10.1093/bioadv/vbae069","DOIUrl":"10.1093/bioadv/vbae069","url":null,"abstract":"<p><strong>Summary: </strong>We explore the nuanced temporal and epistemological distinctions among natural sciences, particularly the contrasting treatment of time and the interplay between theory and experimentation. Physics, an exemplar of mature science, relies on theoretical models for predictability and simulations. In contrast, biology, traditionally experimental, is witnessing a computational surge, with data analytics and simulations reshaping its research paradigms. Despite these strides, a unified theoretical framework in biology remains elusive. We propose that contemporary global challenges might usher in a renewed emphasis, presenting an opportunity for the establishment of a novel theoretical underpinning for the life sciences.</p><p><strong>Availability and implementation: </strong>https://github.com/ouzounis/CLS-emerges Data in Json format, Images in PNG format.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae069"},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11127110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141154743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae066
Willow Kion-Crosby, Lars Barquist
{"title":"Network depth affects inference of gene sets from bacterial transcriptomes using denoising autoencoders.","authors":"Willow Kion-Crosby, Lars Barquist","doi":"10.1093/bioadv/vbae066","DOIUrl":"10.1093/bioadv/vbae066","url":null,"abstract":"<p><strong>Summary: </strong>The increasing number of publicly available bacterial gene expression data sets provides an unprecedented resource for the study of gene regulation in diverse conditions, but emphasizes the need for self-supervised methods for the automated generation of new hypotheses. One approach for inferring coordinated regulation from bacterial expression data is through neural networks known as denoising autoencoders (DAEs) which encode large datasets in a reduced bottleneck layer. We have generalized this application of DAEs to include deep networks and explore the effects of network architecture on gene set inference using deep learning. We developed a DAE-based pipeline to extract gene sets from transcriptomic data in <i>Escherichia coli</i>, validate our method by comparing inferred gene sets with known pathways, and have used this pipeline to explore how the choice of network architecture impacts gene set recovery. We find that increasing network depth leads the DAEs to explain gene expression in terms of fewer, more concisely defined gene sets, and that adjusting the width results in a tradeoff between generalizability and biological inference. Finally, leveraging our understanding of the impact of DAE architecture, we apply our pipeline to an independent uropathogenic <i>E.coli</i> dataset to identify genes uniquely induced during human colonization.</p><p><strong>Availability and implementation: </strong>https://github.com/BarquistLab/DAE_architecture_exploration.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae066"},"PeriodicalIF":2.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11256956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae063
Jingcheng Yang, Mo Sun, Zihan Ran, Taehwan Yang, Deepali L Kundnani, Francesca Storici, Penghao Xu
{"title":"rNMPID: a database for riboNucleoside MonoPhosphates in DNA.","authors":"Jingcheng Yang, Mo Sun, Zihan Ran, Taehwan Yang, Deepali L Kundnani, Francesca Storici, Penghao Xu","doi":"10.1093/bioadv/vbae063","DOIUrl":"10.1093/bioadv/vbae063","url":null,"abstract":"<p><strong>Motivation: </strong>Ribonucleoside monophosphates (rNMPs) are the most abundant non-standard nucleotides embedded in genomic DNA. If the presence of rNMP in DNA cannot be controlled, it can lead to genome instability. The actual regulatory functions of rNMPs in DNA remain mainly unknown. Considering the association between rNMP embedment and various diseases and cancer, the phenomenon of rNMP embedment in DNA has become a prominent area of research in recent years.</p><p><strong>Results: </strong>We introduce the rNMPID database, which is the first database revealing rNMP-embedment characteristics, strand bias, and preferred incorporation patterns in the genomic DNA of samples from bacterial to human cells of different genetic backgrounds. The rNMPID database uses datasets generated by different rNMP-mapping techniques. It provides the researchers with a solid foundation to explore the features of rNMP embedded in the genomic DNA of multiple sources, and their association with cellular functions, and, in future, disease. It also significantly benefits researchers in the fields of genetics and genomics who aim to integrate their studies with the rNMP-embedment data.</p><p><strong>Availability and implementation: </strong>rNMPID is freely accessible on the web at https://www.rnmpid.org.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae063"},"PeriodicalIF":2.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11088741/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-03eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae040
Anna-Sophie Fiston-Lavier, Sandra Dérozier, Guy Perrière, Marie-France Sagot
{"title":"ISMB/ECCB 2023 organization benefited from the strengths of the French bioinformatics community.","authors":"Anna-Sophie Fiston-Lavier, Sandra Dérozier, Guy Perrière, Marie-France Sagot","doi":"10.1093/bioadv/vbae040","DOIUrl":"https://doi.org/10.1093/bioadv/vbae040","url":null,"abstract":"","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae040"},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140892264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-02eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae062
Tomás V Waichman, M L Vercesi, Ariel A Berardino, Maximiliano S Beckel, Damiana Giacomini, Natalí B Rasetto, Magalí Herrero, Daniela J Di Bella, Paola Arlotta, Alejandro F Schinder, Ariel Chernomoretz
{"title":"scX: a user-friendly tool for scRNAseq exploration.","authors":"Tomás V Waichman, M L Vercesi, Ariel A Berardino, Maximiliano S Beckel, Damiana Giacomini, Natalí B Rasetto, Magalí Herrero, Daniela J Di Bella, Paola Arlotta, Alejandro F Schinder, Ariel Chernomoretz","doi":"10.1093/bioadv/vbae062","DOIUrl":"10.1093/bioadv/vbae062","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA sequencing (scRNAseq) has transformed our ability to explore biological systems. Nevertheless, proficient expertise is essential for handling and interpreting the data.</p><p><strong>Results: </strong>In this article, we present scX, an R package built on the Shiny framework that streamlines the analysis, exploration, and visualization of single-cell experiments. With an interactive graphic interface, implemented as a web application, scX provides easy access to key scRNAseq analyses, including marker identification, gene expression profiling, and differential gene expression analysis. Additionally, scX seamlessly integrates with commonly used single-cell Seurat and SingleCellExperiment R objects, resulting in efficient processing and visualization of varied datasets. Overall, scX serves as a valuable and user-friendly tool for effortless exploration and sharing of single-cell data, simplifying some of the complexities inherent in scRNAseq analysis.</p><p><strong>Availability and implementation: </strong>Source code can be downloaded from https://github.com/chernolabs/scX. A docker image is available from dockerhub as chernolabs/scx.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae062"},"PeriodicalIF":2.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11109472/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-04-24eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae061
Jose L Figueroa, Andrew Redinbo, Ajay Panyala, Sean Colby, Maren L Friesen, Lisa Tiemann, Richard Allen White
{"title":"MerCat2: a versatile <i>k</i>-mer counter and diversity estimator for database-independent property analysis obtained from omics data.","authors":"Jose L Figueroa, Andrew Redinbo, Ajay Panyala, Sean Colby, Maren L Friesen, Lisa Tiemann, Richard Allen White","doi":"10.1093/bioadv/vbae061","DOIUrl":"10.1093/bioadv/vbae061","url":null,"abstract":"<p><strong>Motivation: </strong>MerCat2 (\"Mer-Catenate2\") is a versatile, parallel, scalable and modular property software package for robustly analyzing features in omics data. Using massively parallel sequencing raw reads, assembled contigs, and protein sequences from any platform as input, MerCat2 performs <i>k</i>-mer counting of any length <i>k</i>, resulting in feature abundance counts tables, quality control reports, protein feature metrics, and graphical representation (i.e. principal component analysis (PCA)).</p><p><strong>Results: </strong>MerCat2 allows for direct analysis of data properties in a database-independent manner that initializes all data, which other profilers and assembly-based methods cannot perform. MerCat2 represents an integrated tool to illuminate omics data within a sample for rapid cross-examination and comparisons.</p><p><strong>Availability and implementation: </strong>MerCat2 is written in Python and distributed under a BSD-3 license. The source code of MerCat2 is freely available at https://github.com/raw-lab/mercat2. MerCat2 is compatible with Python 3 on Mac OS X and Linux. MerCat2 can also be easily installed using bioconda: mamba create -n mercat2 -c conda-forge -c bioconda mercat2.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae061"},"PeriodicalIF":0.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11090762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140923738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-04-19eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae059
Alexander R Bennett, Daniel Bojar
{"title":"Syntactic sugars: crafting a regular expression framework for glycan structures.","authors":"Alexander R Bennett, Daniel Bojar","doi":"10.1093/bioadv/vbae059","DOIUrl":"https://doi.org/10.1093/bioadv/vbae059","url":null,"abstract":"<p><strong>Motivation: </strong>Structural analysis of glycans poses significant challenges in glycobiology due to their complex sequences. Research questions such as analyzing the sequence content of the α1-6 branch in <i>N</i>-glycans, are biologically meaningful yet can be hard to automate.</p><p><strong>Results: </strong>Here, we introduce a regular expression system, designed for glycans, feature-complete, and closely aligned with regular expression formatting. We use this to annotate glycan motifs of arbitrary complexity, perform differential expression analysis on designated sequence stretches, or elucidate branch-specific binding specificities of lectins in an automated manner. We are confident that glycan regular expressions will empower computational analyses of these sequences.</p><p><strong>Availability and implementation: </strong>Our regular expression framework for glycans is implemented in Python and is incorporated into the open-source glycowork package (version 1.1+). Code and documentation are available at https://github.com/BojarLab/glycowork/blob/master/glycowork/motif/regex.py.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae059"},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11069104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140873530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}