{"title":"SSL-VQ: vector-quantized variational autoencoders for semi-supervised prediction of therapeutic targets across diverse diseases.","authors":"Satoko Namba, Chen Li, Noriko Yuyama Otani, Yoshihiro Yamanishi","doi":"10.1093/bioinformatics/btaf039","DOIUrl":"10.1093/bioinformatics/btaf039","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying effective therapeutic targets poses a challenge in drug discovery, especially for uncharacterized diseases without known therapeutic targets (e.g. rare diseases, intractable diseases).</p><p><strong>Results: </strong>This study presents a novel machine learning approach using multimodal vector-quantized variational autoencoders (VQ-VAEs) for predicting therapeutic target molecules across diseases. To address the lack of known therapeutic target-disease associations, we incorporate the information on uncharacterized diseases without known targets or uncharacterized proteins without known indications (applicable diseases) in the semi-supervised learning (SSL) framework. The method integrates disease-specific and protein perturbation profiles with genetic perturbations (e.g. gene knockdowns and gene overexpressions) at the transcriptome level. Cross-cell representation learning, facilitated by VQ-VAEs, was performed to extract informative features from protein perturbation profiles across diverse human cell types. Concurrently, cross-disease representation learning was performed, leveraging VQ-VAE, to extract informative features reflecting disease states from disease-specific profiles. The model's applicability to uncharacterized diseases or proteins is enhanced by considering the consistency between disease-specific and patient-specific signatures. The efficacy of the method is demonstrated across three practical scenarios for 79 diseases: target repositioning for target-disease pairs, new target prediction for uncharacterized diseases, and new indication prediction for uncharacterized proteins. This method is expected to be valuable for identifying therapeutic targets across various diseases.</p><p><strong>Availability and implementation: </strong>Code: github.com/YamanishiLab/SSL-VQ and Data: 10.5281/zenodo.14644837.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11842052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Paton, Denes Türei, Olga Ivanova, Sophia Müller-Dott, Pablo Rodriguez-Mier, Veronica Venafra, Livia Perfetto, Martin Garrido-Rodriguez, Julio Saez-Rodriguez
{"title":"NetworkCommons: bridging data, knowledge, and methods to build and evaluate context-specific biological networks.","authors":"Victor Paton, Denes Türei, Olga Ivanova, Sophia Müller-Dott, Pablo Rodriguez-Mier, Veronica Venafra, Livia Perfetto, Martin Garrido-Rodriguez, Julio Saez-Rodriguez","doi":"10.1093/bioinformatics/btaf048","DOIUrl":"10.1093/bioinformatics/btaf048","url":null,"abstract":"<p><strong>Summary: </strong>We present NetworkCommons, a platform for integrating prior knowledge, omics data, and network inference methods, facilitating their usage and evaluation. NetworkCommons aims to be an infrastructure for the network biology community that supports the development of better methods and benchmarks, by enhancing interoperability and integration.</p><p><strong>Availability and implementation: </strong>NetworkCommons is implemented in Python and offers programmatic access to multiple omics datasets, network inference methods, and benchmarking setups. It is a free software, available at https://github.com/saezlab/networkcommons, and deposited in Zenodo at https://doi.org/10.5281/zenodo.14719118.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11846666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Du, Hammad Farooq, Pourya Delafrouz, Jie Liang
{"title":"Structural basis of differential gene expression at eQTLs loci from high-resolution ensemble models of 3D single-cell chromatin conformations.","authors":"Lin Du, Hammad Farooq, Pourya Delafrouz, Jie Liang","doi":"10.1093/bioinformatics/btaf050","DOIUrl":"10.1093/bioinformatics/btaf050","url":null,"abstract":"<p><strong>Motivation: </strong>Techniques such as high-throughput chromosome conformation capture (Hi-C) have provided a wealth of information on nucleus organization and genome important for understanding gene expression regulation. Genome-Wide Association Studies have identified numerous loci associated with complex traits. Expression quantitative trait loci (eQTL) studies have further linked the genetic variants to alteration in expression levels of associated target genes across individuals. However, the functional roles of many eQTLs in noncoding regions remain unclear. Current joint analyses of Hi-C and eQTLs data lack advanced computational tools, limiting what can be learned from these data.</p><p><strong>Results: </strong>We developed a computational method for simultaneous analysis of Hi-C and eQTL data, capable of identifying a small set of nonrandom interactions from all Hi-C interactions. Using these nonrandom interactions, we reconstructed large ensembles (×105) of high-resolution single-cell 3D chromatin conformations with thorough sampling, accurately replicating Hi-C measurements. Our results revealed many-body interactions in chromatin conformation at the single-cell level within eQTL loci, providing a detailed view of how 3D chromatin structures form the physical foundation for gene regulation, including how genetic variants of eQTLs affect the expression of associated eGenes. Furthermore, our method can deconvolve chromatin heterogeneity and investigate the spatial associations of eQTLs and eGenes at subpopulation level, revealing their regulatory impacts on gene expression. Together, ensemble modeling of thoroughly sampled single-cell chromatin conformations combined with eQTL data, helps decipher how 3D chromatin structures provide the physical basis for gene regulation, expression control, and aid in understanding the overall structure-function relationships of genome organization.</p><p><strong>Availability and implementation: </strong>It is available at https://github.com/uic-liang-lab/3DChromFolding-eQTL-Loci.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11835231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernat Bramon Mora, Helen Lindsay, Antonin Thiébaut, Kenneth D Stuart, Raphael Gottardo
{"title":"tagtango: an application to compare single-cell annotations.","authors":"Bernat Bramon Mora, Helen Lindsay, Antonin Thiébaut, Kenneth D Stuart, Raphael Gottardo","doi":"10.1093/bioinformatics/btaf012","DOIUrl":"10.1093/bioinformatics/btaf012","url":null,"abstract":"<p><strong>Summary: </strong>In this article, we present tagtango, an innovative R package and web application designed for robust and intuitive comparison of single-cell clusters and annotations. It offers an interactive platform that simplifies the exploration of differences and similarities among different clustering and annotation methods. Leveraging single-cell data analysis and different visualizations, it allows researchers to dissect the underlying biological differences across groups. tagtango is a user-friendly application that is portable and works seamlessly across multiple operating systems.</p><p><strong>Availability and implementation: </strong>tagtango is freely available at https://github.com/bernibra/tagtango as an R package as well as an online web service at https://tagtango.unil.ch.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11814489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scatterbar: an R package for visualizing proportional data across spatially resolved coordinates.","authors":"Dee Velazquez, Jean Fan","doi":"10.1093/bioinformatics/btaf047","DOIUrl":"10.1093/bioinformatics/btaf047","url":null,"abstract":"<p><strong>Motivation: </strong>Displaying proportional data across many spatially resolved coordinates is a challenging but important data visualization task, particularly for spatially resolved transcriptomics data. Scatter pie plots are one type of commonly used data visualization for such data but present perceptual challenges that may lead to difficulties in interpretation. Increasing the visual saliency of such data visualizations can help viewers more accurately identify proportional trends and compare proportional differences across spatial locations.</p><p><strong>Results: </strong>We developed scatterbar, an open-source R package that extends ggplot2, to visualize proportional data across many spatially resolved coordinates using scatter stacked bar plots. We apply scatterbar to visualize deconvolved cell-type proportions from a spatial transcriptomics dataset of the adult mouse brain to demonstrate how scatter stacked bar plots can enhance the distinguishability of proportional distributions compared to scatter pie plots.</p><p><strong>Availability and implementation: </strong>scatterbar is available on CRAN https://cran.r-project.org/package=scatterbar with additional documentation and tutorials at https://jef.works/scatterbar/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11829801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143071297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas-Otavio Peulen, Katherina Hemmen, Annemarie Greife, Benjamin M Webb, Suren Felekyan, Andrej Sali, Claus A M Seidel, Hugo Sanabria, Katrin G Heinze
{"title":"tttrlib: modular software for integrating fluorescence spectroscopy, imaging, and molecular modeling.","authors":"Thomas-Otavio Peulen, Katherina Hemmen, Annemarie Greife, Benjamin M Webb, Suren Felekyan, Andrej Sali, Claus A M Seidel, Hugo Sanabria, Katrin G Heinze","doi":"10.1093/bioinformatics/btaf025","DOIUrl":"10.1093/bioinformatics/btaf025","url":null,"abstract":"<p><strong>Summary: </strong>We introduce software for reading, writing and processing fluorescence single-molecule and image spectroscopy data and developing analysis pipelines to unify various spectroscopic analysis tools. Our software can be used for processing multiple experiment types, e.g. for time-resolved single-molecule spectroscopy, laser scanning microscopy, fluorescence correlation spectroscopy and image correlation spectroscopy. The software is file format agnostic and processes multiple time-resolved data formats and outputs. Our software eliminates the need for data conversion and mitigates data archiving issues.</p><p><strong>Availability and implementation: </strong>tttrlib is available via pip (https://pypi.org/project/tttrlib/) and bioconda while the open-source code is available via GitHub (https://github.com/fluorescence-tools/tttrlib). Presented examples and additional documentation demonstrating how to implement in vitro and live-cell image spectroscopy analysis are available at https://docs.peulen.xyz/tttrlib and https://zenodo.org/records/14002224.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guy Durant, Fergus Boyles, Kristian Birchall, Brian Marsden, Charlotte M Deane
{"title":"Robustly interrogating machine learning-based scoring functions: what are they learning?","authors":"Guy Durant, Fergus Boyles, Kristian Birchall, Brian Marsden, Charlotte M Deane","doi":"10.1093/bioinformatics/btaf040","DOIUrl":"10.1093/bioinformatics/btaf040","url":null,"abstract":"<p><strong>Motivation: </strong>Machine learning-based scoring functions (MLBSFs) have been found to exhibit inconsistent performance on different benchmarks and be prone to learning dataset bias. For the field to develop MLBSFs that learn a generalizable understanding of physics, a more rigorous understanding of how they perform is required.</p><p><strong>Results: </strong>In this work, we compared the performance of a diverse set of popular MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, and PointVS) to our proposed baseline models that can only learn dataset biases on a range of benchmarks. We found that these baseline models were competitive in accuracy to these MLBSFs in almost all proposed benchmarks, indicating these models only learn dataset biases. Our tests and provided platform, ToolBoxSF, will enable researchers to robustly interrogate MLBSF performance and determine the effect of dataset biases on their predictions.</p><p><strong>Availability and implementation: </strong>https://github.com/guydurant/toolboxsf.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools.","authors":"R Prabakaran, Yana Bromberg","doi":"10.1093/bioinformatics/btaf035","DOIUrl":"10.1093/bioinformatics/btaf035","url":null,"abstract":"<p><strong>Motivation: </strong>In silico functional annotation of proteins is crucial to narrowing the sequencing-accelerated gap in our understanding of protein activities. Numerous function annotation methods exist, and their ranks have been growing, particularly so with the recent deep learning-based developments. However, it is unclear if these tools are truly predictive. As we are not aware of any methods that can identify new terms in functional ontologies, we ask if they can, at least, identify molecular functions of proteins that are non-homologous to or far-removed from known protein families.</p><p><strong>Results: </strong>Here, we explore the potential and limitations of the existing methods in predicting the molecular functions of thousands of such proteins. Lacking the \"ground truth\" functional annotations, we transformed the assessment of function prediction into evaluation of functional similarity of protein pairs that likely share function but are unlike any of the currently functionally annotated sequences. Notably, our approach transcends the limitations of functional annotation vocabularies, providing a means to assess different-ontology annotation methods. We find that most existing methods are limited to identifying functional similarity of homologous sequences and fail to predict the function of proteins lacking reference. Curiously, despite their seemingly unlimited by-homology scope, deep learning methods also have trouble capturing the functional signal encoded in protein sequence. We believe that our work will inspire the development of a new generation of methods that push boundaries and promote exploration and discovery in the molecular function domain.</p><p><strong>Availability and implementation: </strong>The data underlying this article are available at https://doi.org/10.6084/m9.figshare.c.6737127.v3. The code used to compute siblings is available openly at https://bitbucket.org/bromberglab/siblings-detector/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miroslav Kratochvíl, St Elmo Wilken, Oliver Ebenhöh, Reinhard Schneider, Venkata P Satagopam
{"title":"COBREXA 2: tidy and scalable construction of complex metabolic models.","authors":"Miroslav Kratochvíl, St Elmo Wilken, Oliver Ebenhöh, Reinhard Schneider, Venkata P Satagopam","doi":"10.1093/bioinformatics/btaf056","DOIUrl":"10.1093/bioinformatics/btaf056","url":null,"abstract":"<p><strong>Summary: </strong>Constraint-based metabolic models offer a scalable framework to investigate biological systems using optimality principles. Construction and simulation of detailed models that utilize multiple kinds of constraint systems pose a significant coding overhead, complicating implementation of new types of analyses. We present an improved version of the constraint-based metabolic modeling package COBREXA, which utilizes a hierarchical model construction framework that decouples the implemented analysis algorithms into independent, yet re-combinable, building blocks. By removing the need to re-implement modeling components, assembly of complex metabolic models is simplified, which we demonstrate on use-cases of resource-balanced models, and enzyme-constrained flux balance models of interacting bacterial communities. Notably, these models show improved predictive capabilities in both monoculture and community settings. In perspective, the re-usable model-building components in COBREXA 2 provide a sustainable way to handle increasingly complex models in constraint-based modeling.</p><p><strong>Availability and implementation: </strong>COBREXA 2 is available from https://github.com/COBREXA/COBREXA.jl, and from Julia package repositories. COBREXA 2 works on all major operating systems and computer architectures. Documentation is available at https://cobrexa.github.io/COBREXA.jl/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11842047/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143375005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingfei Wang, Jinsen Li, Tsu-Pei Chiu, Nicolas Gompel, Remo Rohs
{"title":"DNAdesign: feature-aware in silico design of synthetic DNA through mutation.","authors":"Yingfei Wang, Jinsen Li, Tsu-Pei Chiu, Nicolas Gompel, Remo Rohs","doi":"10.1093/bioinformatics/btaf052","DOIUrl":"10.1093/bioinformatics/btaf052","url":null,"abstract":"<p><strong>Motivation: </strong>DNA sequence and shape readout represent different modes of protein-DNA recognition. Current tools lack the functionality to simultaneously consider alterations in different readout modes caused by sequence mutations. DNAdesign is a web-based tool to compare and design mutations based on both DNA sequence and shape characteristics. Users input a wild-type sequence, select sites to introduce mutations and choose a set of DNA shape parameters for mutation design.</p><p><strong>Results: </strong>DNAdesign utilizes Deep DNAshape to provide ultra-fast predictions of DNA shape based on extended k-mers and offers multiple encoding methods for nucleotide sequences, including the physicochemical encoding of DNA through their functional groups in the major and minor groove. DNAdesign provides all mutation candidates along the sequence and shape dimensions, with interactive visualization comparing each candidate with the wild-type DNA molecule. DNAdesign provides an approach to studying gene regulation and applications in synthetic biology, such as the design of synthetic enhancers and transcription factor binding sites.</p><p><strong>Availability and implementation: </strong>The DNAdesign webserver and documentation are freely accessible at https://dnadesign.usc.edu.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}