Bioinformatics advancesPub Date : 2024-10-14eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae154
Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln
{"title":"PatchProt: hydrophobic patch prediction using protein foundation models.","authors":"Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln","doi":"10.1093/bioadv/vbae154","DOIUrl":"10.1093/bioadv/vbae154","url":null,"abstract":"<p><strong>Motivation: </strong>Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.</p><p><strong>Results: </strong>In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.</p><p><strong>Availability and implementation: </strong>https://github.com/Deagogishvili/chapter-multi-task.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae154"},"PeriodicalIF":2.4,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-11eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae153
Greta Bellinzona, Davide Sassera, Alexandre M J J Bonvin
{"title":"Accelerating protein-protein interaction screens with reduced AlphaFold-Multimer sampling.","authors":"Greta Bellinzona, Davide Sassera, Alexandre M J J Bonvin","doi":"10.1093/bioadv/vbae153","DOIUrl":"10.1093/bioadv/vbae153","url":null,"abstract":"<p><strong>Motivation: </strong>Discovering new protein-protein interactions (PPIs) across entire proteomes offers vast potential for understanding novel protein functions and elucidate system properties within or between an organism. While recent advances in computational structural biology, particularly AlphaFold-Multimer, have facilitated this task, scaling for large-scale screenings remains a challenge, requiring significant computational resources.</p><p><strong>Results: </strong>We evaluated the impact of reducing the number of models generated by AlphaFold-Multimer from five to one on the method's ability to distinguish true PPIs from false ones. Our evaluation was conducted on a dataset containing both intra- and inter-species PPIs, which included proteins from bacterial and eukaryotic sources. We demonstrate that reducing the sampling does not compromise the accuracy of the method, offering a faster, efficient, and environmentally friendly solution for PPI predictions.</p><p><strong>Availability and implementation: </strong>The code used in this article is available at https://github.com/MIDIfactory/AlphaFastPPi. Note that the same can be achieved using the latest version of AlphaPulldown available at https://github.com/KosinskiLab/AlphaPulldown.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae153"},"PeriodicalIF":2.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11513016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-09eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae149
Daniel R Olson, Travis J Wheeler
{"title":"ULTRA-effective labeling of tandem repeats in genomic sequence.","authors":"Daniel R Olson, Travis J Wheeler","doi":"10.1093/bioadv/vbae149","DOIUrl":"10.1093/bioadv/vbae149","url":null,"abstract":"<p><p>In the age of long read sequencing, genomics researchers now have access to accurate repetitive DNA sequence (including satellites) that, due to the limitations of short read-sequencing, could previously be observed only as unmappable fragments. Tools that annotate repetitive sequence are now more important than ever, so that we can better understand newly uncovered repetitive sequences, and also so that we can mitigate errors in bioinformatic software caused by those repetitive sequences. To that end, we introduce the 1.0 release of our tool for identifying and annotating locally repetitive sequence, <b>U</b>LTRA <b>L</b>ocates <b>T</b>andemly <b>R</b>epetitive <b>A</b>reas (<i>ULTRA</i>). <i>ULTRA</i> is fast enough to use as part of an efficient annotation pipeline, produces state-of-the-art reliable coverage of repetitive regions containing many mutations, and provides interpretable statistics and labels for repetitive regions.</p><p><strong>Availability and implementation: </strong>ULTRA is released under an open source license, and is available for download at https://github.com/TravisWheelerLab/ULTRA.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae149"},"PeriodicalIF":2.4,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11580682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae151
Heming Zhang, Dekang Cao, Zirui Chen, Xiuyuan Zhang, Yixin Chen, Cole Sessions, Carlos Cruchaga, Philip Payne, Guangfu Li, Michael Province, Fuhai Li
{"title":"mosGraphGen: a novel tool to generate multi-omics signaling graphs to facilitate integrative and interpretable graph AI model development.","authors":"Heming Zhang, Dekang Cao, Zirui Chen, Xiuyuan Zhang, Yixin Chen, Cole Sessions, Carlos Cruchaga, Philip Payne, Guangfu Li, Michael Province, Fuhai Li","doi":"10.1093/bioadv/vbae151","DOIUrl":"10.1093/bioadv/vbae151","url":null,"abstract":"<p><strong>Motivation: </strong>Multi-omics data, i.e. genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data for mining critical biomarkers. Graph AI models have been widely used to analyze graph-structure datasets, and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data via graph node and edge ranking analysis. Nevertheless, it is nontrivial for graph-AI model developers to pre-analyze multi-omics data and convert the data into biologically meaningful graphs, which can be directly fed into graph-AI models.</p><p><strong>Results: </strong>To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), generating Multi-omics Signaling graphs (mos-graph) of individual samples by mapping multi-omics data onto a biologically meaningful multi-level background signaling network with data normalization by aggregating measurements and aligning to the reference genome. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. In the results, mosGraphGen was used and illustrated using two widely used multi-omics datasets of The Cancer Genome Atlas (TCGA) and Alzheimer's disease (AD) samples.</p><p><strong>Availability and implementation: </strong>The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/FuhaiLiAiLab/mosGraphGen.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae151"},"PeriodicalIF":2.4,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540438/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae150
Maria Tarradas-Alemany, Sandra Martínez-Puchol, Cristina Mejías-Molina, Marta Itarte, Marta Rusiñol, Sílvia Bofill-Mas, Josep F Abril
{"title":"CAPTVRED: an automated pipeline for viral tracking and discovery from capture-based metagenomics samples.","authors":"Maria Tarradas-Alemany, Sandra Martínez-Puchol, Cristina Mejías-Molina, Marta Itarte, Marta Rusiñol, Sílvia Bofill-Mas, Josep F Abril","doi":"10.1093/bioadv/vbae150","DOIUrl":"https://doi.org/10.1093/bioadv/vbae150","url":null,"abstract":"<p><strong>Summary: </strong>Target Enrichment Sequencing or Capture-based metagenomics has emerged as an approach of interest for viral metagenomics in complex samples. However, these datasets are usually analyzed with standard downstream Bioinformatics analyses. CAPTVRED (<i>Capture-based metagenomics Analysis Pipeline for tracking ViRal species from Environmental Datasets</i>), has been designed to assess the virome present in complex samples, specially focused on those obtained by Target Enrichment Sequencing approach. This work aims to provide a user-friendly tool that complements this sequencing approach for the total or partial virome description, especially from environmental matrices. It includes a setup module which allows preparation and adjustment of the pipeline to any capture panel directed to a set of species of interest. The tool also aims to reduce time and computational cost, as well as to provide comprehensive, reproducible, and accessible results while being easy to costume, set up, and install.</p><p><strong>Availability and implementation: </strong>Source code and test datasets are freely available at github repository: https://github.com/CompGenLabUB/CAPTVRED.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae150"},"PeriodicalIF":2.4,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-07eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae145
Masato Tsutsui, Mariko Okada
{"title":"DynProfiler: a Python package for comprehensive analysis and interpretation of signaling dynamics leveraged by deep learning techniques.","authors":"Masato Tsutsui, Mariko Okada","doi":"10.1093/bioadv/vbae145","DOIUrl":"10.1093/bioadv/vbae145","url":null,"abstract":"<p><strong>Summary: </strong>Signaling dynamics encode important features and regulatory mechanisms of biological systems, and recent studies have reported the use of simulated signaling dynamics with mechanistic modeling as biomarkers for human diseases. Given the success of deep learning techniques, it is expected that they can extract informative patterns from simulation results more effectively than traditional approaches involving manual feature selection, which can be used for subsequent analyses, such as patient stratification and survival prediction. Here, we propose DynProfiler, which utilizes the entire signaling dynamics, including intermediate variables, as input and leverages deep learning techniques to extract informative features without requiring any labels. Furthermore, DynProfiler incorporates a modern explainable AI solution to provide quantitative time-dependent importance scores for each dynamics. Using simulated dynamics of patients with breast cancer as an example, we demonstrate DynProfiler's ability to extract high-quality features that can predict mortality risk and identify important dynamics, highlighting upregulated phosphorylated GSK3β as a biomarker for poor prognosis. Overall, this tool can be useful for clinical application, as well as for elucidating biological system dynamics.</p><p><strong>Availability and implementation: </strong>The DynProfiler Python library is available in GitHub at https://github.com/okadalabipr/DynProfiler.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae145"},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-04eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae140
{"title":"Correction to: Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.","authors":"","doi":"10.1093/bioadv/vbae140","DOIUrl":"https://doi.org/10.1093/bioadv/vbae140","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/bioadv/vbae125.].</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae140"},"PeriodicalIF":2.4,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11453097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-03eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae148
Veronica Paparozzi, Christine Nardini
{"title":"tidysbml: R/Bioconductor package for SBML extraction into dataframes.","authors":"Veronica Paparozzi, Christine Nardini","doi":"10.1093/bioadv/vbae148","DOIUrl":"https://doi.org/10.1093/bioadv/vbae148","url":null,"abstract":"<p><strong>Summary: </strong>We present <i>tidysbml</i>, an R package able to perform <i>compartments</i>, <i>species</i>, and <i>reactions</i> data extraction from Systems Biology Markup Language (SBML) documents (up to Level 3) in tabular data structures (i.e. R dataframes) to easily access and handle the richness of the biological information. Thanks to its output format, the package facilitates data manipulation, enabling manageable construction, and therefore analysis, of custom networks, as well as data retrieval, by means of R packages such as <i>igraph</i>, <i>RCy3</i>, and <i>biomaRt</i>. Exemplar data (i.e. SBML files) are extracted from Reactome.</p><p><strong>Availability and implementation: </strong>The <i>tidysbml</i> R package is distributed under CC BY 4.0 License and can be found publicly available in Bioconductor (https://bioconductor.org/packages/tidysbml) and on GitHub (https://github.com/veronicapaparozzi/tidysbml).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae148"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-03eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae147
Rodolfo S Allendes Osorio, Yuji Kosugi, Johan T Nyström-Persson, Kenji Mizuguchi, Yayoi Natsume-Kitatani
{"title":"A modern multi-omics data exploration experience with Panomicon.","authors":"Rodolfo S Allendes Osorio, Yuji Kosugi, Johan T Nyström-Persson, Kenji Mizuguchi, Yayoi Natsume-Kitatani","doi":"10.1093/bioadv/vbae147","DOIUrl":"https://doi.org/10.1093/bioadv/vbae147","url":null,"abstract":"<p><strong>Summary: </strong>To address the challenges of the storage, sharing, and analysis of multi-omics data, here we introduce the newest version of Panomicon, which includes the improvement of the underlying data model, the introduction of new registration and control access service, together with the seamless integration with other services (like TargetMine for data enrichment analysis), integrated in a completely new, more user friendly web application.</p><p><strong>Availability and implementation: </strong>Panomicon is available online at https://panomicon.nibiohn.go.jp. Unregistered users can access the publicly available data uploaded to Panomicon using the following account: user: guest, password: anonymous. Source code for the application is also freely available under a GNU license at https://github.com/Toxygates/Panomicon/. A brief user guide for the new features of Panomicon is provided as supplementary material online.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae147"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iTraNet: a web-based platform for integrated trans-omics network visualization and analysis.","authors":"Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda","doi":"10.1093/bioadv/vbae141","DOIUrl":"https://doi.org/10.1093/bioadv/vbae141","url":null,"abstract":"<p><strong>Motivation: </strong>Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.</p><p><strong>Results: </strong>We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in <i>ob/ob</i> mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.</p><p><strong>Availability and implementation: </strong>iTraNet is available at https://itranet.streamlit.app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae141"},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}