Matei Teleman, Aurélie A G Gabriel, Léonard Hérault, David Gfeller
{"title":"SuperSpot: coarse graining spatial transcriptomics data into metaspots.","authors":"Matei Teleman, Aurélie A G Gabriel, Léonard Hérault, David Gfeller","doi":"10.1093/bioinformatics/btae734","DOIUrl":"10.1093/bioinformatics/btae734","url":null,"abstract":"<p><strong>Summary: </strong>Spatial Transcriptomics is revolutionizing our ability to phenotypically characterize complex biological tissues and decipher cellular niches. With current technologies such as VisiumHD, thousands of genes can be detected across millions of spots (also called cells or bins depending on the technologies). Building upon the metacell concept, we present a workflow, called SuperSpot, to combine adjacent and transcriptomically similar spots into \"metaspots\". The process involves representing spots as nodes in a graph with edges connecting spots in spatial proximity and edge weights representing transcriptomic similarity. Hierarchical clustering is used to aggregate spots into metaspots at a user-defined resolution. We demonstrate that metaspots reduce the size and sparsity of spatial transcriptomic data and facilitate the analysis of large datasets generated with the most recent technologies.</p><p><strong>Availability and implementation: </strong>SuperSpot is an R package available at https://github.com/GfellerLab/SuperSpot and archived on Zenodo (https://doi.org/10.5281/zenodo.14222088). The code to reproduce the figures is available at https://github.com/GfellerLab/SuperSpot/tree/main/figures (https://doi.org/10.5281/zenodo.14222088).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seonghwan Park, Min Young Kim, Jaewon Jeong, Sohae Yang, Minseok S Kim, Inkyu Moon
{"title":"Quantitative analysis of the dexamethasone side effect on human-derived young and aged skeletal muscle by myotube and nuclei segmentation using deep learning.","authors":"Seonghwan Park, Min Young Kim, Jaewon Jeong, Sohae Yang, Minseok S Kim, Inkyu Moon","doi":"10.1093/bioinformatics/btae658","DOIUrl":"10.1093/bioinformatics/btae658","url":null,"abstract":"<p><strong>Motivation: </strong>Skeletal muscle cells (skMCs) combine together to create long, multi-nucleated structures called myotubes. By studying the size, length, and number of nuclei in these myotubes, we can gain a deeper understanding of skeletal muscle development. However, human experimenters may often derive unreliable results owing to the unusual shape of the myotube, which causes significant measurement variability.</p><p><strong>Results: </strong>We propose a new method for quantitative analysis of the dexamethasone side effect on human-derived young and aged skeletal muscle by simultaneous myotube and nuclei segmentation using deep learning combined with post-processing techniques. The deep learning model outputs myotube semantic segmentation, nuclei semantic segmentation, and nuclei center, and post-processing applies a watershed algorithm to accurately distinguish overlapped nuclei and identify myotube branches through skeletonization. To evaluate the performance of the model, the myotube diameter and the number of nuclei were calculated from the generated segmented images and compared with the results calculated by human experimenters. In particular, the proposed model produced outstanding outcomes when comparing human-derived primary young and aged skMCs treated with dexamethasone. The proposed standardized and consistent automated image segmentation system for myotubes is expected to help streamline the drug-development process for skeletal muscle diseases.</p><p><strong>Availability and implementation: </strong>The code and the data are available at https://github.com/tdn02007/QA-skMCs-Seg.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11723526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Castrense Savojardo, Matteo Manfredi, Pier Luigi Martelli, Rita Casadio
{"title":"DDGemb: predicting protein stability change upon single- and multi-point variations with embeddings and deep learning.","authors":"Castrense Savojardo, Matteo Manfredi, Pier Luigi Martelli, Rita Casadio","doi":"10.1093/bioinformatics/btaf019","DOIUrl":"10.1093/bioinformatics/btaf019","url":null,"abstract":"<p><strong>Motivation: </strong>The knowledge of protein stability upon residue variation is an important step for functional protein design and for understanding how protein variants can promote disease onset. Computational methods are important to complement experimental approaches and allow a fast screening of large datasets of variations.</p><p><strong>Results: </strong>In this work, we present DDGemb, a novel method combining protein language model embeddings and transformer architectures to predict protein ΔΔG upon both single- and multi-point variations. DDGemb has been trained on a high-quality dataset derived from literature and tested on available benchmark datasets of single- and multi-point variations. DDGemb performs at the state of the art in both single- and multi-point variations.</p><p><strong>Availability and implementation: </strong>DDGemb is available as web server at https://ddgemb.biocomp.unibo.it. Datasets used in this study are available at https://ddgemb.biocomp.unibo.it/datasets.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Mechtersheimer, Wenze Ding, Xiangnan Xu, Sanghyun Kim, Carolyn Sue, Yue Cao, Jean Yang
{"title":"IMPACT: interpretable microbial phenotype analysis via microbial characteristic traits.","authors":"Daniel Mechtersheimer, Wenze Ding, Xiangnan Xu, Sanghyun Kim, Carolyn Sue, Yue Cao, Jean Yang","doi":"10.1093/bioinformatics/btae702","DOIUrl":"10.1093/bioinformatics/btae702","url":null,"abstract":"<p><strong>Motivation: </strong>The human gut microbiome, consisting of trillions of bacteria, significantly impacts health and disease. High-throughput profiling through the advancement of modern technology provides the potential to enhance our understanding of the link between the microbiome and complex disease outcomes. However, there remains an open challenge where current microbiome models lack interpretability of microbial features, limiting a deeper understanding of the role of the gut microbiome in disease. To address this, we present a framework that combines a feature engineering step to transform tabular abundance data to image format using functional microbial annotation databases, with a residual spatial attention transformer block architecture for phenotype classification.</p><p><strong>Results: </strong>Our model, IMPACT, delivers improved predictive accuracy performance across multiclass classification compared to similar methods. More importantly, our approach provides interpretable feature importance through image classification saliency methods. This enables the extraction of taxa markers (features) associated with a disease outcome and also their associated functional microbial traits and metabolites.</p><p><strong>Availability and implementation: </strong>IMPACT is available at https://github.com/SydneyBioX/IMPACT. We providedirect installation of IMPACT via pip.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xike Ouyang, Yannuo Feng, Chen Cui, Yunhe Li, Li Zhang, Han Wang
{"title":"Improving generalizability of drug-target binding prediction by pre-trained multi-view molecular representations.","authors":"Xike Ouyang, Yannuo Feng, Chen Cui, Yunhe Li, Li Zhang, Han Wang","doi":"10.1093/bioinformatics/btaf002","DOIUrl":"10.1093/bioinformatics/btaf002","url":null,"abstract":"<p><strong>Motivation: </strong>Most drugs start on their journey inside the body by binding the right target proteins. This is the reason that numerous efforts have been devoted to predicting the drug-target binding during drug development. However, the inherent diversity among molecular properties, coupled with limited training data availability, poses challenges to the accuracy and generalizability of these methods beyond their training domain.</p><p><strong>Results: </strong>In this work, we proposed a neural networks construction for high accurate and generalizable drug-target binding prediction, named Pre-trained Multi-view Molecular Representations (PMMR). The method uses pre-trained models to transfer representations of target proteins and drugs to the domain of drug-target binding prediction, mitigating the issue of poor generalizability stemming from limited data. Then, two typical representations of drug molecules, Graphs and SMILES strings, are learned respectively by a Graph Neural Network and a Transformer to achieve complementarity between local and global features. PMMR was evaluated on drug-target affinity and interaction benchmark datasets, and it derived preponderant performance contrast to peer methods, especially generalizability in cold-start scenarios. Furthermore, our state-of-the-art method was indicated to have the potential for drug discovery by a case study of cyclin-dependent kinase 2.</p><p><strong>Availability and implementation: </strong>https://github.com/NENUBioCompute/PMMR.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751634/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PredCMB: predicting changes in microbial metabolites based on the gene-metabolite network analysis of shotgun metagenome data.","authors":"Jungyong Ji, Sungwon Jung","doi":"10.1093/bioinformatics/btaf020","DOIUrl":"10.1093/bioinformatics/btaf020","url":null,"abstract":"<p><strong>Motivation: </strong>Microbiota-derived metabolites significantly impact host biology, prompting extensive research on metabolic shifts linked to the microbiota. Recent studies have explored both direct metabolite analyses and computational tools for inferring metabolic functions from microbial shotgun metagenome data. However, no existing tool specifically focuses on predicting changes in individual metabolite levels, as opposed to metabolic pathway activities, based on shotgun metagenome data. Understanding these changes is crucial for directly estimating the metabolic potential associated with microbial genomic content.</p><p><strong>Results: </strong>We introduce Predicting Changes in Microbial metaBolites (PredCMB), a novel method designed to predict alterations in individual metabolites between conditions using shotgun metagenome data and enzymatic gene-metabolite networks. PredCMB evaluates differential enzymatic gene abundance between conditions and estimates its influence on metabolite changes. To validate this approach, we applied it to two publicly available datasets comprising paired shotgun metagenomics and metabolomics data from inflammatory bowel disease cohorts and the cohort of gastrectomy for gastric cancer. Benchmark evaluations revealed that PredCMB outperformed a previous method by demonstrating higher correlations between predicted metabolite changes and experimentally measured changes. Notably, it identified metabolite classes exhibiting major alterations between conditions. By enabling the prediction of metabolite changes directly from shotgun metagenome data, PredCMB provides deeper insights into microbial metabolic dynamics than existing methods focused on pathway activity evaluation. Its potential applications include refining target metabolite selection in microbial metabolomic studies and assessing the contributions of microbial metabolites to disease pathogenesis.</p><p><strong>Availability and implementation: </strong>Freely available to non-commercial users at https://www.sysbiolab.org/predcmb.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Darius P Schaub, Behnam Yousefi, Nico Kaiser, Robin Khatri, Victor G Puelles, Christian F Krebs, Ulf Panzer, Stefan Bonn
{"title":"PCA-based spatial domain identification with state-of-the-art performance.","authors":"Darius P Schaub, Behnam Yousefi, Nico Kaiser, Robin Khatri, Victor G Puelles, Christian F Krebs, Ulf Panzer, Stefan Bonn","doi":"10.1093/bioinformatics/btaf005","DOIUrl":"10.1093/bioinformatics/btaf005","url":null,"abstract":"<p><strong>Motivation: </strong>The identification of biologically meaningful domains is a central step in the analysis of spatial transcriptomic data.</p><p><strong>Results: </strong>Following Occam's razor, we show that a simple PCA-based algorithm for unsupervised spatial domain identification rivals the performance of ten competing state-of-the-art methods across six single-cell spatial transcriptomic datasets. Our reductionist approach, NichePCA, provides researchers with intuitive domain interpretation and excels in execution speed, robustness, and scalability.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/imsb-uke/nichepca.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SampleExplorer: using language models to discover relevant transcriptome data.","authors":"Wee Loong Chin, Timo Lassmann","doi":"10.1093/bioinformatics/btae759","DOIUrl":"10.1093/bioinformatics/btae759","url":null,"abstract":"<p><strong>Motivation: </strong>Over the last two decades, transcriptomics has become a standard technique in biomedical research. We now have large databases of RNA-seq data, accompanied by valuable metadata detailing scientific objectives and the experimental procedures used. The metadata is crucial in understanding and replicating published studies, but so far has been underutilized in helping researchers to discover existing datasets.</p><p><strong>Results: </strong>We present SampleExplorer, a tool allowing researchers to search for relevant data using both text and gene set queries. SampleExplorer embeds sample metadata and uses a transformer-based language model to retrieve similar datasets. Extensive benchmarking (see Supplementary Materials and Methods) using the ARCHS4 database demonstrates that SampleExplorer provides an effective approach for retrieving biologically relevant samples from large-scale transcriptomicdata. This tool provides an efficient approach for discovering relevant gene expression datasets in large public repositories. It improves sample and dataset identification across diverse experimental contexts, helping researchers leverage existing transcriptomic data for potential replication or verification studies.</p><p><p>Availability and implementation: SampleExplorer is available as a Python package compatible with versions 3.9 to 3.11, available for installation via the Python Package Index (PyPI). The codebase and documentation are accessible at https://github.com/wlchin/SampleExplorer. Supplementary data (Supplementary Materials and Methods) provides detailed methodological information, including an algorithmic description of the retrieval process and data preparation steps.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751629/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating single-cell multimodal epigenomic data using 1D convolutional neural networks.","authors":"Chao Gao, Joshua D Welch","doi":"10.1093/bioinformatics/btae705","DOIUrl":"10.1093/bioinformatics/btae705","url":null,"abstract":"<p><strong>Motivation: </strong>Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using these types of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multichannel sequential signal.</p><p><strong>Results: </strong>We developed ConvNet-VAEs, a novel framework that uses one-dimensional (1D) convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CUT&Tag and single-cell nanobody-tethered transposition followed by sequencing data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully connected architectures increases with the number of modalities, and deeper convolutional architectures can increase the performance, while the performance degrades for deeper fully connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets.</p><p><strong>Availability and implementation: </strong>The source code of VAE models and a demo in Jupyter notebook are available at https://github.com/welch-lab/ConvNetVAE.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Xue, Chenwei Sun, Wenhao Zheng, Jiancheng Lv, Xianggen Liu
{"title":"TargetSA: adaptive simulated annealing for target-specific drug design.","authors":"Zhe Xue, Chenwei Sun, Wenhao Zheng, Jiancheng Lv, Xianggen Liu","doi":"10.1093/bioinformatics/btae730","DOIUrl":"10.1093/bioinformatics/btae730","url":null,"abstract":"<p><strong>Motivation: </strong>The burgeoning field of target-specific drug design has attracted considerable attention, focusing on identifying compounds with high binding affinity toward specific target pockets. Nevertheless, existing target-specific deep generative models encounter notable challenges. Some models heavily rely on elaborate datasets and complicated training methodologies, while others neglect the multi-constraint optimization problem inherent in drug design, resulting in generated molecules with irrational structures or chemical properties.</p><p><strong>Results: </strong>To address these issues, we propose a novel framework (TargetSA) that leverages adaptive simulated annealing (SA) for target-specific molecular generation and multi-constraint optimization. The SA process explores the discrete structural space of molecules, progressively converging toward the optimal solution that fulfills the predefined objective. To propose novel compounds, we first predict promising editing positions based on historical experience, and then iteratively edit molecular graphs through four operations (insertion, replacement, deletion, and cyclization). Together, these operations collectively constitute a complete operation set, facilitating a thorough exploration of the drug-like space. Furthermore, we introduce a reversible sampling strategy to re-accept currently suboptimal solutions, greatly enhancing the generation quality. Empirical evaluations demonstrate that TargetSA achieves state-of-the-art performance in generating high-affinity molecules (average vina dock -9.09) while maintaining desirable chemical properties.</p><p><strong>Availability and implementation: </strong>https://github.com/XueZhe-Zachary/TargetSA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}